Formation and Destruction of Autocatalytic Sets 
in an Evolving Networl< Model 




Centre for Theoretical Studies 
Indian institute of Science 
Bangalore - 560012, India 

August 2003 



2 



3 



Declaration 

This thesis describes work done by me during my tenure as a PhD student at the Centre for 
Theoretical Studies, Indian Institute of Science, Bangalore. This thesis has not formed the basis 
for the award of any degree, diploma, membership, associateship or similar title of any university 
or institution. 

Sandeep Krishna 
August 2003 

Centre for Theoretical Studies 
Indian Institute of Science 
Bangalore - 560012 
India 



4 



Contents 



Introductio 





1.1 


Networks in chemical, biological and social svstems . . . . 






1.2 


Graoh reoresentation of a 


networki 








1.3 


Difficulties of creating a graoh representation 






1.4 


Structure of networksl . . 










1.5 


Dvnamical svstems on networks 










Evolution of networksl . . 










1.7 


Framework of a model in 


which the network co-evolves with other variables . . . . 




l.§ 


Extensions of the frameworki 








1.9 


The origin of life: evolution of a chemical network 






1.10 Catastroohes and recoveries in evolving networksl 




1.10.1 Innovations! 




1.11 A mao of subseauent chaoters 












2 


Definitions and Terminologv 










2.1 


Directed eraohs and adiacencv matrices . . 








2.2 


Degrees and deoendencv 










2.2.1 Degree of a node and degree distribution of a graoh 






2.2.2 Deoendencv and interdeoendencv 




2.3 


Walks, oaths and cvclesi . 










2.4 


Connected comoonents of a graohl 








2.5 


Partitioning a graoh into its strong components 






2.$ 


Condensation of a graohl . 










2.7 


Irreducible graohs and matricesi 








2.8 


Perron-Frobenius eigenvectors fPFEslI . . . 








2.8.1 The Perron-Frobenius theorem 




2.8.2 Basic subgraphs] 




2.9 


Random graphsl 



17 



26 
28 
28 
30 
30 
31 

33 

33 
35 
35 



38 



40 
41 
42 



5 



Autocataj^^ti^^ets 



3.1 AiJtor.atalvtir. sets (ACSsi) 



3.2 Relationship between Perron-Frobenius eigenvectors and autocatalvtic sets 



^j2_Eigenvectoi^£rofile,^heoi^^ 



^j^_£ore,^nd_£eryjher^,o£^_sij^ 



^^^^oi^^n^^engha^^^^TOj^im^l^^F^ 



The nrofile of PFEs when there is an ACS but only one basic, subgranh 



3.8 The profile of PFEs when there is an ACS and many basic subgraphs 



4.1 The nonulation dynamics enuationi 



4.2 Attractors of the population dynamics equation 



4.3 Attractor profile theorenni 

4.4 The attractor for a granh with no AC^ 



4..'^ The dominant ACS of a granhl 



4.6 Examples of the attractor for specific graphs 



4.7 Timescale for reaching the attractoT 
4.R Core and nerinhery of a granhl . . . . 
4J Keystone nodes 



^jragl^D^namics 



■S.l Granh dynamics rules 



Features of the granh dynamics 



5.2.1 Evolution in a prebiotic pool 



5.2.2 Coupling of population and graph dynamics: two timescales 



^.2.^ Absence of self-re plicators 



5.2.4 Selection and novelty 



5.3 ImnlementationI .... 

5.4 Results of graph evolution 



Formation and Growth of Aiitoratalvtir Sets 



fi.l The random nhasj 



6.2 The growth phase 



^^^^^nmescal^bi^row th of the dominant ACS 



6.3 Fully autocatalvtic granhs 



^^^^^^^^^^ili^^^^^^^^nd om graph being fully autocatalvtic 



^j^j2__^lusterin^,coefficie^ 



6.3.3 Degree and dependency distributions 



Contents 



7 


Destruction of Autocatalvtic Sets 












Catastroohes and recoveries in the oreanized and erowth ohases 




7.2 


Crashes and core-shifta 










7.3 


Addition and deletion of nodes from a erraoh 






7.3.1 Deletion of a node: kevstone nodes | 




7.3.2 Addition of a node: innovations 




7.4 


Classification of core-shifts 










7.4.1 Comolete crashes 




7.4.2 Takeovers bv core transformine innovations 






7.4.3 Takeovers bv dormant innovations 






Timescale of crashes 












Recoveries 










7.7 


Structure of the graph just before a crash 





Robustness of the ACS Growth Mechanism 



t 



89 

89 
92 
93 
93 
95 
97 
98 
98 
100 
100 
103 
103 

105 



8.1 Variants of the model! 105 



8.2 Variable link strengthsl 106 



8.3 Neerative links: emergence of cooperationi 108 

8.4 Self-renlir.atorj ■ ■ ■ ■ 113 

8.5 Non-extremal selection I 114 



8.7 Different population dvnamics 



8.6 Variable number of nodej . 118 

120 

123 

123 



Concludinsf Remarks 



9.1 Interestinef features of the model 



9.2 implications for the origin of lifd 128 



9.3 Limitations of the model as a description of a chemical networki 129 



.4 Possible extensions of the m ode] 130 



9.5 Modeling other svstems within the same frameworki 131 



A Proofs of Propositions 



B 


Proerams to simulate the model 








B.l Variables and data tvpes . . . 








B.2 Creatine the initial random graph 






B.3 The grgph updgd 



133 

149 

149 
149 



B.4 Proeram that uses the attractor orofile theorem 




B.4.1 Determining which node to remove 


B.4.2 OutDut files 


B.5 


Program that finds the attractor numericallv 




B.5.1 Numerical method for determining the attractor 


B.R.2 Output fileJ 



C Contents of the attached CD 



Bibliography 



Acknowledgements 



Chapter 1 

Introduction 



In this thesis I will analyze a model of an evolving set of catalytically interacting molecules. The 
dynamical rules of the model attempt to capture key features of the evolution of chemical organiza- 
tions on the prebiotic Earth. Such a set of molecules is an ideal example of a 'network' - a system 
of several interconnected components. The concept of a network is often used as a metaphor in 
describing a variety of chemical, biological and social systems, as overviewed in section []~T1 Graphs 
provide a natural way of representing networks in a mathematical model. I describe how a graph 
can be used to represent a network and the difFiculties involved in creating such a representation in 
sections fL2l and lL3l Sections ll.4l ll.Sl and ll.6l discuss several studies of network systems, broadly 
classified according to whether they focus on the structure, function or evolution of networks. This 
classification is not meant to be precise - several studies could be placed in more than one category. 
In analyzing the model I will be mainly interested in the evolution of the graph representing the 
chemical network. Some issues of interest are discussed in section (LGI including the spontaneous 
growth of non-random graph structures, the effect of different graph structures on the selective 
pressures that drive the evolution and sudden mass extinctions of species. Sections [LTI and [LSI 
describe a framework for modeling an evolving network, within which some of these issues can be 
addressed. The specific model I will analyze will be an instance of this framework. Sections lOl and 
ll.lQI mention some of the interesting phenomena observed in the model and discuss its possible use 
in addressing puzzles connected with the evolution of chemical networks on the prebiotic Earth. 
Finally, I provide a 'map' of subsequent chapters in section IT.llI 

1 . 1 Networks in chemical, biological and social systems 

The term 'network' is often used to describe a variety of chemical, biological and social systems: 
The metabolism of a cell is a network of substrates and enzymes interacting via chemical reac- 
tions, ecosystems are networks of biological organisms with predator-prey, competitive or symbiotic 
interactions, the brain is a network of interconnected neurons, and the Internet is a network of 
interconnected computers. 
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R eviews of recent work on 'network s ' (IWattsl.ll999l:IStrogata.l2001 



Boss 



2002 



Albert and Barabas 



I2OO2 : Dorogovtsev and Mendes . 2OO2I. boOSaf Bornholdt and SchusteT 2OO3I ) testify to the d 



iver- 



sity of systenns for which the ternn is used: 



• Ecolo gical networks: Williams and Martinez l|200Q ). Montoya and Sole ( 2002 ) and Camacho et al 
(I2OO2I 1 h ave studied a number of food webs found in freshwater, marine-freshwater interface 
and terrestrial environments. They have constructed graph representations of these food 
webs. A graph consists of a set of 'nodes', pairs of which may be connected by 'links'. 
It is usually drawn as a set of circles (the nodes) with arrows (the links) connecting pairs 
of nodes. In the graph representations created by Montoya and Sole, the nodes represent 
species, genera and sometimes higher taxa, and links represent predator-prey interactions. 
Figure nm shows a graph for the Ythan estuary food web. It has 135 nodes and 601 links; 
each link points from a species to one of its predators. 



Transcription regi 


1 atorv 


networks ( Milo et al 


2002: 


Farkas et al.. 


2003 


; 


Maslov et al. . 


2003) 



20031 ): Transcription of a gene is the process of 



constructing an mRNA from the DNA sequence of the gene. This process is regulated by 
proteins, called transcription factors, that enhance or inhibit transcription by binding to sites 
on the DNA, usually upstream of the gene, which in turn helps or prevents RNA polymerase 
and the rest of the basal transcription machinery from binding and initiating transcription. 
The protein product of one gene may act as a transcription factor for another gene - this 
network of interacting genes is called a transcription regulatory network. 

• Biochemical signaling networks: A number of biochemical signaling pathways exist in 
cells which receive and transmit information, and perform computational functions that are 
necessary for regulating various cellular activities. These pathways are highly interconnected, 
with different pathways sharing several common components, and thus forming a signaling 
network. One example is the mitogen- activated protein kinase network, which is involved in 
regulating the cell cycle (fBhalla et al.l 

• Neural networks: The complete neural network of the nematode Caenorhabditis elegans 
has been mapped and can be represented as a graph with 302 nodes, each corresponding to 
one neuron, and n nore than 5000 link s corresponding to synaptic or gap junction con n ection s 



between neurons (IWhit 



Watts and Strogatj. |l£2^; iKoch and Laurent!. llQQqh. 



Polymer chain networks: The conformation space of a lattice polymer chain can be rep- 
resented as a graph, with the nodes representing possible conformations of the polymer 
chain and the links representing the po ssibility of transfor m ing one conformation to the other 
through local movements of the chain jscala et al. . 2001 ). Sen and Chakrabarti j2001 ) con- 
sider a different type of network - they represent a linear polymer as a regular one-dimensional 
graph with nearest neighbour links and some long range links. 



1.1. Networks in chemical, biological and social systems 
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Figure 1.1: Ythan estuary food web. Each of the 135 nodes (circles) represents a species and there 
are 601 directed links (arrows, pointing from prey to predator). This data is freely available from the 
website of the COSIN project {http://www.cosin.org). The graph has been drawn using LEDA, the 
Library of Efficient Data types and Algorithms, which is now a proprietary software distributed by 
Algorithmic Solutions Software GmbH {http://www.algorithmic-solutions.com/enleda.htm). The 
vertical positioning of each node is decided by the length of the shortest path leading to it from 
the basal species (the bottom node). 
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Figure 1.2: A graph of around 500 chemical reactions that take place in cells; image by Dennis 
Bray, Cambridge Univ. Each node represents a chemical species. Two species are linked if there is 
a reaction in which one species is a reactant and the other a product. 



1.1. Networks in chemical, bioiogicai and sociai systems 
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Figure 1.3: The protein interaction network for the yeast Saccharomyces cerevisiae. The nodes 
represent proteins. Pairs of proteins that are known to interact are linked. There are 1870 nodes 
and 2240 links. A node is coloured red if rennoving the corresponding protein is lethal for the cell, 
green if non-lethal, orange if removing the protein results in slower growth, and yellow if the effect of 
removing the protein is not known. Image available at http://www. nd.edu/^ networks/gallery.htm, 
courtesy Hawoong Jeong, Korea Advanced Institute of Science and Technology. 
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200d ): The metabolism of an 



Metabolic networks l|Fell and Wagnel |2I 
organism, such as the bacterium Escherichia coli, is typically specified by a set of chemical 
reactions involving different substrates that are catalyzed by various enzymes. Figure [Ol 
shows a graph of around 500 metabolic reactions. Each node represents one chemical species; 
a link connects two species if there is a reaction in which one species is a reactant and the 
other a product. This graph, unlike that in Figure IlTI is an undirected one - the links have 
no associated direction and are therefore drawn as lines, not arrows. 

Protein interaction networks: The prot ein-protein interact i ons in the yeast Saccharomyces 



cerevis i ae have been mod eled a s graphs bvUeonp et al.l (|200lh , IWagnerl (|200lh , iMaslov and Sneppen 



Sole et al.l pOO 




yazauez et a 1. 1 (120031) u sing data from two-hybrid assays and 
I. bood: llto et al. . 2001 ). Figure O shows the protein inter- 



other experiments ijuetz 
action map for S. cerevisiae. In the graph each node represents a protein and an undirected 
link connects two proteins if they are known to interact. There are 1870 nodes and 2240 
links. 

Internet: The Internet is a network of interconnected computers. Typically a few comput- 
ers at one physical location (say a university or a company) are connected by a 'local area 
network'. These LANs interact with each other via 'routers' or 'gate ways'. The topo logy of 
the Internet can be studied at various levels of detail. For example. Faloutsos et al.l (|l999) 
have studied a graph of the Internet at both the router level, where each node corresponds to a 
router, and at a more coarse-grained level, where each node c orresponds to a group of routers. 
Another study of the Internet at the router level was done bv lGovindan and Tangmunarunkit 
(|200oh . F igure ll.4l shows a graph of the Internet at the router level. Another representation 
of the Internet is in terms of 'autonomous systems'; each autonomous system "approxi- 
mately maps to an internet service provider (ISP) and its links are inter-ISP connections" 
jPastor-Satorras et al. . 2001 ). 

World-Wide Web: The WWW is a network of interconnected hypertext documents. The 
connections are in the form of 'hyperlinks' that can be followed to other docu ments on the 
WWW. The structure of the WWW has been studied at this level of resolution ([Albert et a I 

2o'qqI ) as well as at the level of 'sites', co llections of 



m^, iRroder et a 1. 1. l2000l: iKumar et al 



all the documents stored on the same web server ([Huberman and Adamid. Il999l ). 

Peer to peer networks: The Gnutella network is one example. It consists of a set of com- 
puters con nected to each other for the purpose of sharing files, with no central coordinating 



computer ([Adamic 



Power grid networks: Watts and Strogatz (|l998 ) have studied a graph of the electricity 
transmission grid of western USA. The nodes of the graph represent generators, transformers 
and substations, and links represent transmission lines. 




Figure 1.4: A graph of the Internet at the router level, as of September 1998. Image by Hal Burt 
and Bill Cheswick, courtesy Lumeta Corporation. 
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Transportation networks: The Indian railway sy stenn, cons i sting of a network of stations 
connected by train services, has been studied by ISen et a 1. 1 (|2QQ3l ). A similar network of 
the world's airports h as been reconstructe d fronn data on nunnber of passengers arriving and 



leaving each airport l|Amaral et all 12000 ) 



Economic networks l|Kirmanl. l2003l ): An economic system can be thought of as a network 
of interacting 'agents' (individuals, companies or even nations). Several different types of 
interactions can be visualized. For instance, one company may use the product of another 
as a raw material. In a market, individuals may interact by barter or monetary transactions. 
Stock market traders interact by buying and selling stocks and shares. 



Word co-occurrence network: li Cancho and Sold ()200lh and IPorogovtsev and Mendes 
(|200lh h ave studied a graph representing the English language. The nodes represent the 
different words of the English language and links signify that two words occur adjacent to 
each other in at least one sentence in the database of texts studied. The resulting graph has 
approximately 470 thousand nodes and 17 million links. 



Friendship networks: 



Amaral et al 



have studied the properties of a graph repre- 



senting the friendship network of a group of 417 high school students in USA. 

Sexual contact network: A graph with each node repre senting an individua l and links 
representing sexual contacts was constructed and studied by Liljeros et al. l|2001 ). based on 
a Swedish survey of 2810 people in the age range 18-74 years. 



Film actor/actress collaboration network: A graph can be constructed where nodes 
represent ac tors and actresses a n d a li nk is put between two if they have acted together 
in any film. IWatts and Strogata (jl998l ) have studied such a graph of collaborations for all 
actors listed in the Internet Movie Database {http://us.imdb.com). 

Scientific collaboration networks: A similar graph can be constructed for scientific col- 
laborations, wh ere nodes represe nt researchers and a link represents co-authorship of one 



or more papers. 



Newman 



(|2000aB h as studied such a graph of collaborations constructed 
from databases of papers in physics, biomedical research and computer science. 

Citation networks ^Segi3 Es3; Ednfi], Il998l ): It is common for research papers to cite 
several other papers. Thus a network of mutual citations interconnects research papers. 



1 .2. Graph representation of a network 
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1 .2 Graph representation of a network 

As is evident from the list above, the starting point of many studies is to model networks using 
graphs. A graph representation is useful for formulating questions about the structure and func- 
tioning of a network more precisely. Graph theory suggests several quantities that can characterize 
the structure of a graph. These can be computed for a real network and used for comparison with, 
for instance, random and regular graphs. 

The correlation of these graph theoretic quantities with the functioning of the network can also 
be studied, thus exposing, in a more precise way, the connection between a network's structure 
and its functioning. Another possibility a graph representation allows is of a cross comparison of 
very different systems - for instance, the metabolic network of a cell can be compared with the 
Internet. 

In many natural systems, the network of interactions is itself a dynamical variable: The tran- 
scription regulatory network of an organism changes as genes evolve, the World Wide Web changes 
as web pages get added and deleted, and ecosystems change as species become extinct and new 
species arise. With a graph representation of the network, one can model processes that alter the 
graph with time, either in discrete steps or continuously. 



1 .3 Difficulties of creating a grapti representation 



Representing a network as a graph is not as straightforward as it may seem at first glance. Firstly, 
it is likely that several different graphs can be drawn for a given system, each of which focus 
on different aspects of the network. Consider, for example, the metabolism of the bacterium 
Escherichia coli, which is specified b y a set of chemica l react ions involving different substrates that 
are catalyzed by various enzymes. Fell and Wag ner j200ol ) construct two different graphs from 
this information. The 'reaction graph' has a node for each chemical reaction with a directed link 
from one reaction to another if a product of the first reaction is used as a reactant for the second. 
The 'substrate graph' has one node for each substrate with a directed link from one substrate to 
another if there is a reaction in which the first substrate is a reactant and the second a product. 
Jeong et al.l (j2QQd ) represent the metabolic network of each of 43 organisms as a graph consisting 
of two types of nodes, one type for each substrate and one type for each reaction. The graph is 
bipartite with links from substrate nodes to reaction nodes and reaction nodes to substrate nodes 
but no links connecting two substrate or two reaction nodes. This graph has more information 
than the previous two but also leaves out some information, for example, the stoichiometry of 
the reactions. Thus, each of these three types of graphs encodes different information about the 
metabolic network, and each of them excludes certain information. 



Another problem is the incompleteness and uncertainty of data ()Kohnl . ll999l ). When the system 
consists of a complex network of interactions, even a single missed interaction or an erroneously 
added interaction, can drastically affect the dynamics. Food webs tend to suffer from a bias 
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ag ainst including parasites iwilliams and Martinez , 20 Q( j). Incompleteness of data is suggested 
by IWillianns and Martinez and ICamacho et al.l (|2QQ2 l) as one of the possible reasons why 

the Ythan estuary food web appears to differ, in several properties, fronn the other food webs 
they have studied. Protein interaction networks are another example. The two hybrid experiments 
that have been used to build these networks are susceptible to both false positives as well as false 
negatives, i.e., some protein in teractions may be erroneously mis sed and some non-existent protein 
interactions may be included 



1 .4 Structure of networks 



How can one characterize the structure of the Internet, the neural network of Caenorhabditis 
elegans, or the Indian railway system? Representing a network as a graph makes it possible to 
use various graph theoretic measures to characterize its structure. For instance, one can compute 
the length of the shortest path between two nodes averaged over all pairs of nodes in the graph. 
This quantity can be compared with that expected for a 'random graph'. Random graphs - graphs 
where each pair of nodes has a probability, r>, to be connected by a link (see section IZ9l for a 
more precise definition) - were introduced bv lErdos and Renvil ()l959l ). and have several known 



structural characteristics. In the limit where the number of nodes of the graph, N, tends toward 
infinity, then if p < (equivalently, if the number of links / < the graph consists of several 
connected clusters of nodes, each of which has only a finite number of nodes, and if p > 1/N 
for ^ > a 'giant' connected cluster exists which has an infinite number of nodes l|Bollobasl. 

2QQlh . The average shortest path length for an undirecte d random graph of nodes, with enough 

At the other extreme are regular 



links to contain a giant cluster, grows as InA^ ()Bollobas 
graphs, such as a one-dimensional chain of N nodes whose average shortest path length scales in 
proportion to (for a d-dimensional hyp er-cubic la t tice it wou Id scale as N^/'^). 
Graphs of the Indian railway network 



Em). 



20031 ). the western USA power g rid network, 
the movie actor network, the neural network of C. elegans (IVVatts and Strogatzl. llS98), the 



world airport netwo r k and polymer chain networks (jAmaral et al.L boOC ). and ecological networks 
( Montoya and Sole . l2002 ) are known to have low average path lengths similar to random graphs 
with the same number of nodes and links. 

However they differ from random graphs in other characteristics, like 'clustering'. A set of 
nodes is 'clustered' if in some sense the nodes are more 'strongly connected' to each other than to 
other nodes of the graph outside the set. One measure of the amount of clustering in a graph is the 
'clustering coefficient', which is the probability that two neighbours of any node are also neighbours 
of each other. Regular graphs have a relatively large clustering coefficient, while random graphs 
have a relatively low coefficient. 

The networks mentioned above, which have a low average path length, have much higher 
clustering coefficients than random graphs with the same number of nodes and links. Graphs of 
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this type which have a comparable average path length but a much higher clustering coe f 
than a simila r random graph have been dubbed 'small-world' graphs (fWatts and Strogatz 



icient 



Another measure is the degree distribution of a graph. The degree of a node (the total number 
of outgoing and incoming links to a node, see section IZ2|) takes a range of values over all the nodes 
of any graph. The distribution of these values can be analytically calculated for the random graph 
described above, and is a binomial distribution. Given a graph of a real network, say the Internet, 
one can compute its degree distribution and compare with the binomial distribution expected for 
a random graph with the same number of nodes and links. In contrast, a regular graph, such as 
the lattice of a crystal solid, has nodes whose degrees take at most a few different values. For 
example, in a body-centred cubic lattice (such as that formed by iron or Cs CI) each node (atom) has 
a degree eight. Many net v yorks such as the metabolic network of E. coli Jjeong et al. . 20od ). the 



Internet 11 Fa 



I999I ) and citation networks (Se glenl. I1992I: iRednerl. Il998l ) have neither 



the binomial degree distribution of a random graph nor the discrete distribution of a regular graph. 
I nstead their degree d i stribu ti on has a power law t ail. S uch networks have been termed 'scale-free' 
( Barabasi and Albert!. Il999l ). iMontova and Sold ()2002l ) claim th at ecological ne t works also have 
a scale-free degree distribution, a conclusion that is disputed bv ICamacho et al.l (1200211. Severa l 



prope rties of scale-free graphs have been elucidated ([Newman et a 



2mil 



Bollobas and Riordan 



20031). For instance, scale-f ree graphs are small-world, havin g a small aver age shortest path length 



IIBarabasi and Albert 



and a high clustering coefficient (|Wattsl.ll999l ). However, the converse 



is not true: lAmaral et al.l (|200Q| ) show that not all small-world networks are scale-free - among 
several different networks that are small-world, the degree distribution of some have an exponential 
tail like random graphs, some are scale-free, while others have a power-law distribution which is 
truncated by an exponential or a gaussian tail. 

Yet another measure used to characterize a graph is its eigenvalue spectrum. Any graph with 
nodes can be specified by an x 'adjacency' matrix. The collection of eigenvalues of this matrix 
is the eigenvalue spectrum of the graph. Some structural characteristics of a graph are reflected in 
the eigenvalue spectrum. For instance, in a directed graph, the largest eigenvalue is directly related 
to the presence or absence of cycles in the graph (see section [Z8|l . Though only some aspects 
of the connection between the spectrum and the structure of the graph have been worked out, 
it nevertheless serves to classify graphs into different groups. Once again, some properties of the 
spectrum of a random graph are known. For an undirected random graph, with enough links to 
contain a giant cluster, the value of the largest eigenvalue scales as pN, when ^ 00 , while the 
distribut ion of the rest of th e eigenvalues converges to a semi-circle whose width is proportional 
to y/N (|Farkas et al.l. boOll ). Therefore, the amount by which the eigenvalue spectrum of a real 
network deviates fr om t his 'semi-c i rcle la w' is a measure of the non-randomness of the graph. 



Farkas et al.l (|200lh and 



Goh et al.l (|200lh have shown that the eigenvalue distributions of sparse 



random graphs and scale-free graphs are very different from the semi-circle distribution. 
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The clustering and degree of a node are 'local' quantities, in the sense that they depend only 
on the immediate links of the node or at most the links of its neighbours. In contrast, shortest 
path lengths, eigenvalues and eigenvectors are 'non-local' quantities, in the sense that the shortest 
path between two nodes, or the component of an eigenvector corresponding to a node, depends on 
the entire graph and not just on the immediate links of the nodes in question. It is therefore not 
surprising that the possibility of missing or extra links is a more serious problem for the non-local 
quantities than for the local quantities. The addition or removal of a small number of links will 
not significantly affect the degree distribution or the clustering coefficient. In contrast, adding 
a small number of r a ndom links to a regular graph reduces the average path length drastically 
( Watts and Strogata. llQQa ). Similarly, the addition of a single link can create a cycle, where there 
was none before, thereby changing the largest eigenvalue. 

Another non-local graph-theoretic measure is the dependency of a node, which, for a directed 
graph, is defined to be the number of links that lie on paths leading to the node in question 
( Jain and Krishna. Il999l ). The average dependency of nodes of a graph is termed the 'interdepen- 
dency' of the graph because it is a measure of how interdependent are the nodes of the graph. 
Section [Z2l will discuss more about these measures. 

A structural characteristic that is of interest, especially for chemical networks, is the pr operty 



of aut o catalysis. The co ncep t of an autocat alytic set of chemical species was introduced bylEigen 



mm, 



Kauffman 



()l97lh and 



Rossler 



and is defined in that context to be a set of molecular 



species that contains, within itself, a catalyst for each of its member species. This notion is closely 
related to that of positive feedback loops and can be generalized to different systems which have 



Lee et a I 



other kinds of 'benefic i al' inte ractions - for instance, symbiosis in ecological networks. iHong et al 
(mi and 



( 19971 ) have each constructed a pair of different molecules that catalyze 
one another's ( a s well as t heir own) produc tion and hence form an autocatalytic set of this kind. 



Wachterhause 



(|l99nh 



and 



Morowitz et a I 



1 lilQfl.) 



use a slightly different, but related, definition 



of autocatalysis to argue that the reductive citric acid cycle is autocatalytic. A graph-theoretic 
definition of autocatalytic sets (|jain and Krishnal. is discussed in section ITll Autocatalytic 

sets will play a major role in the dynamics of the model studied in this thesis. 

Seve ral other structura l characteristics of graphs that have this non-local character have been 



studied: 



Dhar et al 



have analyzed the structure of the sh o rtest p ath spanning all nodes in a 
variety of graphs derived from regular lattices: Uanaki and Guptel (|2QQ3l ) define the "weight-bearing 
capacity" of a branching hiera rchica l network and analyz e which network structures enhance this 



capacity; 



Hartwell et 



and 



Lauffenburgen (|2QQQl ) argue that cellular networ ks are 'modu 



l ar', i.e ., consisting of distinguishable clusters of nodes that play some functional role; 



Ravasz et al 



e meta 
. (I2OO2 


Dolic r 
) and 


etwork of E. 
Maslov et al. 


coli ma 
J2OO3) 



( 2002l ). IShen-Orr et al.l (|2002l ) and lMaslov et al.l (|2003l ) describe a way of decomposing the tran- 
scription regulatory network of E. coli into commonly occurring 'motifs', which are specific patterns 
of connections between a small set of nodes. 



1 .5. Dynamical systems on networks 



21 



1 .5 Dynamical systems on networks 



A different set of questions about such networks concerns the dynannics of variables 'living' on 
the graph - for example, the concentrations of different chemical substrates in a cell, or the 
populations of species in an ecosystem, or the number of times a document on the WWW is 
accessed. The network topology affects the dynamics of these variables. The food-web structure 
of an ecosystem affects the dynamics of its constituent species' populations; the network of human 
contacts influences the spread of a contagious disease. 

Such dynamical systems on networks are often modeled as a set of coupled differential equa- 
tions in which the couplings are specified by the network of interactions. An example is a network 
of coupled (possibly nonlinear) oscillators where one common question asked is whether groups of 
oscillators eventually synchronize despite different natural frequencies and initial phases. The syn- 
chronization properties of coupled dynamica l sys tems have been studied for netvyork structures rang- 
ing fr om a fully connected graph Jstrogatz . 2QQQI ) to regular lattices jKaneko , 198S : Perez et al. . 

JCade et al.l,ll995l), and a variety of sparse network structures l|Jalan and Amritkan. 



1996fl to trees dGade et al 



2QQ3l:ISinha. 2002). _Ramaswamy ( 1997t) has described the synchronization of coupled strange non- 
chaotic attractors. The stability of th e synchronization to perturbations of the coupling strengths 



has been studied by lChen et al 



for a large class of coupled maps and differential equations. 
In ecosystem models, the Lotka-Volterra equation, replicator equation and other more com- 
plicated differential equations have been used to represent the dynamics of the populations of 
species (jProssel and McKand . l2Q03l ). The replicator equation, for example, displays a range of 
different behaviour depending on the netwo rk structure - fixed point, limit cycle, heteroclinic cycle 
and chaotic attractors have been observed I Hofbauer and Sigmundl 198£). Bioche mical signaling 



networks too exhibit a variety of dynamics (jBhalla and lyengan . il999l: iBhallal 



The dynamics of neural networks has been extensively studied. One example is a model of 
a neural network that exhibits a periodic cycling between different meta-stable states ("mem- 
ories") as well as intermitten t tr ansitions to a high ac t ivity s tate resembling epileptic seizures 
( Biswal and Dasgupta . 2002al lbl). Isinha and Chakrabarti ( 1999 ) review what is known about syn- 
chronization in neural network models. For a comprehensive review of neural network models see 



ma) 



Some dynamical systems on networks have the property of disp laying a variety of different 

describe a simple model of 



dynamics even for the same network topology. ISuguna et al 



a biochemical pathway that can show fixed point, periodic, birhythmic and chaotic dynamics as 
the rates of the reactions comprisin g the pathway are changed, while keeping the topology of the 



pathway fixed. iBhalla et al 



(|2002h analyze a biochemical signaling pathway whose dynamics is 
either monostable or bistable depending on the parameter values. 

In some systems it is more appropriate to represent the dynamics by difference equations, rather 
than differential equations, or by cellular automata. This is especially true for systems where the 
variables of interest take discrete values and assuming them to be continuous variables is a bad 
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approximation. For instance, models of traffic on transportation networks in which particles (vehi- 
cles) are individually represented often show a different behaviour (e.g., different phase transitions 
from a free flowing state to a traffic j am st ate) from models in which a continuous traffic flow 
rate is the dynamical variable. iNagdl (|2QQ3l ) reviews different traffic models on va riou s types of 
networks. Several studies of neural networks model the neurons as cellular aut omata 
Two st udies where the dynamics occurs in discrete time steps are described by lPandit and Amritkar 
who have studied the dynamics of random walkers on a class of small-world networks, and 



Dube et al.l (j2QQ2l ). who have studied the effect of different pricing schemes on the dynamics of 
queues of people accessing an Internet web server. 

When the dynamics can be assumed to be reaching some steady state or fixed point, there may 
be short cuts to finding the s teady state without actually having to solve the equations of motion. 
For instance. lEdwards et al.l (|2QQlh describe a procedure for determining the rates of the chemical 
reactions comprising the intermediary metabolism of E. coli, assuming a steady state. Instead 
of finding the rates as the attractor of some difference or differential equations, the procedure is 
based on linear optimization of a specifically chosen function under the constraints imposed by the 
stoichiometry of the reactions. The structure of the steady state reaction fluxes can be studied as 
a function of different network topologies. 



Local search strategies are another example of dynamical systems on a network ([Adamic et a 



2QQ3I ). The Gnutella file sharing system consists of a network of computers; there is no central 
coordinating computer that has information about the entire network, in particular, which files are 
available at which nodes. Therefore, each user must use a local search strategy, an algorithm that 
uses only local information, such as the identities and connections of a particular node's neighbours, 
its neighbour's neighbours, etc., to find where a particular file is located on the network. Again, the 
network topology is crucial because, for example, the most efficient s earch algorithrns for regular 
graphs are quite different from those for random or scale-free graphs (fAdamic et al.l. 120031 ). 



Another set o f studies involving dynamic s on networks is in the field of epidenniolog y. In the 



models studied by fwatts and Strogatz 



|l££i) 



and 



Pastor-Satorras and Vespignani 



diseases 



spread faster in small-world or scale-free graphs than in regular and random graphs. Similar models 
are used to study the spread of computer viruses through the Internet. These models can suggest 
efficient immunization techniq ues by identifying which links or nodes of the graphs are co ntributing 



most to the spread of the virus (fPandit and Amritkar 



1999l:IPastor-Satorras and Vespignani...2003.). 



Close y related to this a r e studies of 'attacks' o n a network by the r e mova l of no des of the 



network. 



Albert et al 



Callaway et al 



Cohen. R.. et al.l (|200nl. l2001h show that 



(|200nh and 

scale-free networks are more robust to random attacks than random graphs but are susceptible to 
directed attacks at the nodes with the highest degree. These studies contain networks that are 
changing with time, though the dynamics simply involves nodes getting removed in a specified 
order. The next section deals with studies of evolving networks with more complicated dynamics. 
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1 .6 Evolution of networks 



So far I have focused mainly on the structure of, and the behaviour of dynamical systems on, 
fixed networks. However, real networks are rarely static. An obvious question, therefore, is how 
(and why) did a particular network evolve into the specific structure we see? For instance, has the 
structure of a network evolved to optimize certain functionality? Further, one can ask what are 
the mechanisms that cause the network to change? Is the evolution driven by Darwinian natural 
selection, or Lamarckian selection, or by other self-organizing processes? 

Some of the structural and dynamical studies in the previous two sections can be u sed to 
make some guesses about the evolution of the network. For instance, iFell and Wagner! (j20QQh 
suggest that the small-world s tructure of metabolic netwo rks may have evolved to enable a cell to 
react rapidly to perturbations. IWatts and Strogatzl ()l998l ) suggest that the visual cortex may have 
evolved into a small-world architecture because that would aid the synchronization of neuron firing 
patterns. The robustness of scale-free networks to random remova of nodes has been suggested 
as an evolutionary reason for the prevalence of scale-free networks (jAlbert et a 1. 1. bOQCl ). 

While studies of static networks do provide some insight into network evolution, it is natural to 
address such evolution-related questions in models where the network is also a dynamical variable. 
One possibility is to analyze a variety of simple models of changing networks, with different rules 
governing the dynamics of the network, and see which rules pro duce structures like s mall-w orld 
or scale-free graphs that are commonly observed in real networks: Iwatts and Strog atz ( 1998 ) de- 
scribe a model that produces a small-world graph by randomly rewiring or adding links to a regular 
graph. A similar model (where the number of nodes is fixed but li n ks are repeatedly added) that 
produces scale-free networks, is described by Mukherjee and Manna l|2QQ3 ). Several other 'growing 



network' models start from an empty network and add nodes and links at discrete time steps. The 
'preferential attachment' rule - new nodes are preferentially as si gned links to nodes with a high 



degree - produces scale-free gra phs jsarabasi and Albert 1999 ) 



Albert and Barabasi 



JaOoJ anc 



Dorogovtsev and Mended (|2QQ2l ) review several different models of this type, using both determin- 



isti c and stochast ic rules, that give rise to scale-free graphs. The growing network model described 
by 



Manna 



20031 ) adds another level of complexity by making the addition of new links to the 
network dependent on the dynamics of several random walkers on a regular lattice. Variants of 
these models exist where the preferential attachment to nodes of high degree competes with the 
preferential atta chment to no d es of a lesser age, i.e., nodes that have been added to the network 

have shown that such aging effe cts can result in a graph 



more recently. 

|POW( 

3) dis 



Amaraletal 



whose p ower-law degree distribution is truncated by an exponential tail. iDorogovtsev and Mendes 
( 2003bl ) discuss mo dels of accelerated growth of n etworks, where the rate of addition of new nodes 



increases with time. 



Williams and Martinez! (|200Q| ) describe a simple set of rules to grow a network 



that p roduces graphs remarkably similar in structure to many ecological food webs l|Camacho et al 
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One aspect missing from these growing network models is that the dynamics of the network is 
not intertwined with the dynamics of other variables. The models that include aging are implicitly 
using a dynamical variable, the age of a node, but the dynamics of that variable is not coupled 
to the structure of the graph. In many natural systems the dynamics of the network is tightly 
coupled to the dynamics of the variables living on it, and vice versa. For example, a food-web 
influences the dynamics of the species' populations, and if a species becomes extinct, the food- 
web changes. The firing pattern of neurons depends on the structure of the neural network and 
the stren gths of syna ptic connections are in turn modified by the repeated firing or non-firing of 



19951 ). There exist a number of models that incorporate this feature too. Models 



of learning and memory in neur al netvyorks ty pically have neuron firing patterns co-evolving with the 

ving ecosystem models have 



network conn ection strengths iArbib . 1995). A number of such evo 



been studied (iLindgren and Nordahll 



Drossel and McKane 



mi 



Sole and Manrubial.ll996 



Chowdhurv et a 1. 1.1200.^ (see 



20031. fo r a review). Evolving replica tor networks have been described by 



Hannel and Stadlerl jlQQRh andlTokita and Yasutomil (|200.^ 



In addition to creating evolving network models that produce graphs having a similar structure 
to existing networks, one can also use evolving network models to address a number of other 
evolution-related questions. For instance, how does the existing structure of a network influence 
its subsequent evolution? This question is best addressed in a model where the evolution of 
the network is coupled to the dynamics of other variables, which, in turn, are affected by the 
network structure. Because of this coupling, the different (sub)structures in the network affect its 
evolution. For instance, autocatalytic sets in prebiotic chemical networks might have been more 
stable to perturbations because of their ability to self-replicate. If so, an interesting question is: Can 
such stable structures grow, or spread through a network and, if they can, over what timescales? 
Further, one can ask how the existing structures determine the short and long term effects of a 
perturbation on the evolution of the network; in an ecosystem, whether an invading species will be 
able to survive, and for how long, will depend on the competition and resources provided by the 
existing species. 



A related set of (meta)questions can be asked about the 'evolvability' of networks. iKirschner and Gerhart 



( 19981 ) have defined the evolvability of a biological organism as the capacity to generate heritable, 
selectable phenotypic variation. This definition can be extended to other types of networks. Then 
one can ask: How evolvable is a network? Is this evolvability also subject to selection and has it, 
therefore, itself evolved over time? 

One of the inspirations for the model I will discuss in this thesis are the models of Kauffman, 
Farmer, Fontana and others who have explored questions about self-organization, the origin of life 
and some of the above evolvability issues in work on artificial chemistries. An artificial chemistry is 
a system whose component s 'react' with each oth er in a way analogou s to molecules participating 



in chemical reactions. Thus. lKauffman 



and 



Bagley et al.l (|l99lh consider systems composed 



of strings of arbitrary length made from a binary alphabet. Strings can participate in 'cleavage' 
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and 'ligation reactions' which involve the splitting or concatenation of strings to fornn new ones. 
Just as with a real chemistry, not all reactions will be allowed and there are different ways of 
specifying which strings participate in which reactions. The sinnplest way is to randomly decide 
which reactions are allowed, resulting in a 'random chemistry' of the type studied by Kauffman. 
He found that there is a critical number of links, such that if a random chemistry has mo re than 
that number of links, it is almost certain to contain an autocatalytic set ijKaufFman . 



Baglev et al.l ()l99lh have explored an artificial chemistry consisting of catalyzed ligation and 



cleavage reactions. Their network evolves with time as new molecules can be created by the 
ligation of existing ones. They address the puzzle of how large molecules with highly specialized 
functionality, like enzymes and DNA, could have evolved from an initial condition that contained 
only small molecules which interacted weakly and non-specifically. Again, autocatalytic sets play 
an important role in the overall dynamics and are suggested as one of the possible means by 
which a c omple x ch emical organization a 'me tabolism', could have evolved on the prebiotic Earth. 



Fontanal (|l99lh and lFontana and BussI (|l994n study more abstract artificial chemistries consisting 



of functions expressed in the lambda-calculus. Each function can 'react' with other functions, 
producing a new function by the composition of the 'reactant' functions. They find that in such 
a chemistry there is a spontaneous emergence of groups of cooperative reactions such as self- 
replicators, self-replicating sets, autocatalytic cycles, symbiotic and parasitic functions. 

If Darwinian natural selection is driving the evolution of the network one can ask, what were 
the selective pressures acting on the network at different times? How do the selective pressures 
change with the structure of the network? Whenever the dynamics of the network has a stochastic 
component the detailed structure of the network is typically a result of many chance events. 
Nevertheless, some patterns may still be predictable. Thus, it is of interest to ask which patterns 
are predictable and which are a result of historical accidents. 

Another aspect of the dynamics of evolutionary systems is the destruction of structures. In 
evolving systems, non-random structures not only emerge but also get destroy ed in some circum- 
stances. A number of mass extinctions are documented in the fossil record (Newman and Palmer. 



1999 ^. Financial markets suffer sudden large crashes ([Bouchaudl . ,2001: Jo hansen and Sornette . 



2QQlh . Discovering markers that can be used to predict an imminent crash would obviously be very 



useful in the context of financial markets. It is of interest to try to identify the possibly multiple 



h crashes. 


Many me 


chanisms 


have 


I^Mavnard- 


Smithl. lis 


sd: iGlenl. 


1994) 



IS one 



example of attempts to explain the distribution of extinction sizes. iTokita and Yasutomil (|l999l ) 
have studied various statistical properties of the extinction events in their model of an ecosystem. 
In ecosystems there can exist 'keystone' species whose extinctions are likely to trigger a cascade of 
further extinctions jpainel. Il96d : jjordan et al.l . [l99d : ISole and Montoval . [200ll ). This raises several 
interesting questions: Knowing the structure of an ecological network can one predict which species 
are keystone? Are there any mechanisms that exist to 'protect' such important species? 
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In this thesis I will try to address some of these issues within the context of a specific mathemat- 
ical model of an evolving network. Many of the phenomena exhibited by the model are reminiscent 
of the phenomena seen in a variety of evolving networks - the growth of non-random structure 
(in the form of autocatalytic sets), the growth of interdependence and cooperation between nodes 
of the network, sudden mass extinctions due to the extinction of keystone species or the creation 
of certain new structures in the network (termed 'innovations'). The model provides a simple 
mathematical framework within which to analyze the mechanisms that produce such phenomena. 
The basic model, and its variants, will be specific instances of a broad framework for modeling 
evolving networks that I present in the following section. 

1.7 Framework of a model in which the networl< co-evolves 
with other variables 

Consider a process that alters a network, represented by a graph, in discrete steps. The series of 
graphs produced can be denoted C„,n = 1,2, . . .. Each step of the process, taking a graph from 
Cn-i to Cn, will be called a 'graph update event'. The following framework defines a class of 
evolving network models that consist of a graph evolving, by a series of graph update events, along 
with other system variables: 

Variables: The dynamical variables are a directed graph C and a variable Xi associated with each 
node i of the graph. For example, Xi could represent: 

• the concentrations of substrates and enzymes in a metabolic network, 

• the expression level of genes in a genetic regulatory network, 

• the populations of species in an ecosystem, 

• the state (firing or not firing) of neurons in the brain, 

• the number of times each web page on the world wide web has been accessed, 

• the number of films each actor/actress has worked in, 

• the profits of each of a set of interacting companies. 

Initialization: To start with, the graph C and the variables Xi are given some initial values. The 
particular choice of values will depend on the system being modeled. 

Dynamics: 

Step 1: First, keeping the graph [Cn-i at step n — 1) fixed, the Xi are evolved for a specified 
time T according to a set of differential equations that can be schematically written in 
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the form: Xj = fi{Cn-i,xi,X2, ■ ■ ■), where fi are certain functions that depend upon 
the graph C„_i and on all the Xj variables. 

Step 2: After this, some nodes may be removed from the graph, along with their links. Some 
links may also be removed from the graph. Which nodes and links are removed will, 
in general, depend on the values of the nodes and the graph C„_i. 

Step 3: Similarly, some nodes may be added to the graph. These new nodes will be assigned 
links with existing nodes. The rules for specifying which links will be assigned may 
depend on the Xi values of the other nodes and C„_i. The new node will be given 
its own Xi variable, which will be assigned some initial value. Some new links may be 
added between existing nodes, which again may depend on the Xi values of the nodes 
and Cn-i- 

Steps 2 and 3 produce a new graph which will be the graph at time step n, C„. This process, from 
step 1 onward, is iterated. 

The above provides a nonequilibrium statistical mechanics framework for a model of an evolving 
network. The graph is generically in a nonequilibrium state because of the constant flux of nodes 
and links into and out of the graph. The Xi variables may also be in a nonequilibrium state 
depending on the form of of fi. The graph dynamics in this framework is a specific case of a 
Markov process on the space of graphs. For a general Markov process, at time n — 1, the graph 
Cn-i determines the transition probability to all other graphs. The stochastic process picks the 
new graph for time n, C„, using this probability distribution. In the example here, the transition 
probability is not specified explicitly. It arises implicitly as a consequence of the dynamics of the Xi 
variables (step 1) and the way the specified rules for steps 2 and 3 use the Xi values to determine 
which nodes and links will be removed or added. 

There are two timescales built into this framework. On a timescale much shorter than T, the 
Xi variables can evolve while the graph remains fixed. On a longer timescale, the graph changes 
in discrete steps that involve the possible removal and addition of nodes and links. The operation 
of the Xi dynamics and the graph dynamics on different timescales is common to many systems. 
In the brain, over short times neurons fire or not depending on their fixed synaptic connections 
and the firing of other neurons, while over a long timescale the firing pattern can cause the 
synaptic connections to strengthen or weaken. In an ecosystem, over short timescales the species 
composition is fixed while the populations change, and over long times the network changes because 
of the extinction and mutation of existing species and the invasion of the system by new species. 
In a genetic regulatory network, over short times the genes and regulatory interactions are fixed 
and it is the expression level of genes that changes with time, and on a longer timescale the genes 
themselves evolve. These examples also reiterate what was mentioned before - the Xi dynamics 
and the graph dynamics are interdependent. This feature is also built into the above framework: 
The dependence of the Xi dynamics on the graph lies in the dependence of the functions fi on C. 



28 



Chapter 1 . Introduction 



The dependence of the graph dynamics on Xi lies in steps 2 and 3. The particular choices of fi as 
well as the rules determining which, if any, nodes and links will be removed or added will depend 
on the particular system that is being modeled. 



1 .8 Extensions of the framework 



This framework can be extended to include a larger class of evolving network models. Firstly, as 
mentioned in section flTsl it may be preferable to model some systems using difference equations or 
cellular automata. For instance, if the populations of all species in an ecosystem can be assumed 
to be large then treating them as continuous variables and representing their dynamics by differ- 
ential equations may be reasonable. But if the populations are close to zero this could be a bad 
approximation. It is easy to alter step 1 to take into account these possibilities. Secondly, it may 
be more appropriate to model some systems using two or more graphs, or by some generalization 
of a graph like a hypergraph. For example, in a cell, the metabolic network is linked to the genetic 
regulatory network and the dynamics on one network can affect the dynamics on the other. A 
number of neural networ k mod els of learning consist of two interacting networks, the 'teacher' 
and the 'student'; Kinzel (|2QQ3l ) has reviewed several models that consist of more than one neu- 



ral network interacting with on e another. The framework described above can be generalized to 



deal with such systems: iKohn 



as described a generalized form of a graph that he uses 
to represent all the different types of interactions in the mammalian eel cycle contr ol and DNA 
repair systems. Thirdly, as is the case in some neural network models (|Arbihl. 11993 ). it may be 
worthwhile to consider dynamical rules where the network changes continuously with time, rather 
than in discrete steps as in the above framework. 



1 .9 Ttie origin of life: evolution of a chiemical network 



In this thesis I will describe a model - a particular instance of the above framework - whose rules 
attem pt to describe some aspec ts of the dynamics of a chemical network in a pool on the prebiotic 
Earth ijjain and Krishnal. 199E ). Assume that the pool contains many amino acid monomers or 
nucleotide bases, as well as s ome sniall p olypeptide chains or short RNA molecules which may 
have a weak catalytic activity jjoyce . 1989 ^. For instance, some of these polypeptides or catalytic 
RNA might catalyze the production of other polypeptides or RNA from the monomers present in 
the pool. Such a chemical network can be represented as a graph in which the nodes are the 
polypeptides/RNA and the links represent their catalytic interactions. This would be the graph 
C in the model. The Xj could represent the concentrations or relative populations of each type, 
or species, of polypeptide or RNA. Xi would change with time as the possible chemical reactions 
proceed to produce or deplete the different molecular species. Focusing only on the catalyzed 
reactions which produce different molecular species from the monomers, one can imagine that for 
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a short timescale, over which the pool remains undisturbed, the chemical network (i.e., C) will be 
fixed with no molecular species entering or exiting the pool. Thus, the Xi will evolve according to 
the chemical rate equations, with C fixed. In section l4~Tl I derive the form of fi for such chemical 
rate equations under the following assumptions: 

• only reactions involving the catalyzed production of each molecular species are modeled, 

• the catalyzed reactions are described by the Michaelis-Menten theory of enzyme catalysis, 

• the Michaelis constants for each reaction are very large compared to the populations, 

• the concentrations of all the required reactants are non-zero and fixed, 

• all catalysts have the same strength, 

• the pool is well-stirred, i.e., there are no spatial degrees of freedom. 

Over long timescales imagine that the pool is subject to perturbations in the form of floods, tides 
or storms which may flush out part of the pool and possibly introduce new molecules into the pool. 
Removing a node from the graph would correspond to removing all the molecules of a particular 
type from the pool. When a perturbation removes a random lot of molecules, the molecular species 
with smaller Xi are more likely to be completely wiped out from the pool. For step 2, I will choose 
a rule that implements an extreme version of such selection: the node with the least Xi will be 
removed along with all its links. A perturbation can add a new node to the graph, with a small Xi 
value. An added node corresponds to a new type of molecule being brought into the pool by the 
perturbation. As there is no reason to assume such a node would have any particular relationship 
with existing nodes it is reasonable to assign the links of the new node randomly. Therefore, for step 
3, I choose the rule: add one new node, whose links with existing nodes are assigned randomly. 
In this scenario one can assume that the perturbation would not remove existing links without 
removing a node, or create a new link between existing nodes. Therefore, these possibilities can 

be excluded from the model. A detailed description of the model rules is given in chapter 5. 

Such a model could address some questions concerning the origin of life on Earth jjain and Krishna 



2QQlh . The chemical network of a bacterial cell of today consists of several thousand types of 



molecules involved in thousands of different chemical reactions. Each type of molecule plays a 
definite functional role and the entire chemical network is organized to enable the cell to perform 
the various functions it needs to survive and reproduce in often extreme conditions. This chemical 
organization is highly non-random - the molecules found in cells are a small subset in a very large 
space of possible molecules and the graph that describes their interactions is a special kind of graph 
in the very large space of graphs. The probability of such a structured chemical organization arising 
by pure chance is extremely small. After the oceans condensed about 4 billion years back on the 
prebiotic Earth there was no such chemical network of interactions existing anywhere. However, 
if we assume that life originated on Earth about 3.5 to 3.8 billion years ago as suggested by the 
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microfossil evidence (jSchopfl. then that leaves a few hundred million years for the first living 

cells to form. A puzzle of the origin of life on Earth is: how did such a structured, non-random 
chemical organization form in such a short time? This question can be addressed in the above 
model. One can start with a random graph, representing the situation just after the oceans would 
have condensed on Earth, where the pool would be likely to contain a random set of molecules with 
no particular non-random chemical organization. As I will show, starting from this initial state, 
the chemical network i n the model descr i bed a bove evolves inevitably into a highly non-random, 
autocatalytic network (jJain and Krishna. Il99£ . The mechanisms that cause this growth 



might throw light on the above puzzle. 



1.10 Catastrophes and recoveries in evolving networl<s 



The model I will analyze also exhibits sudden catastrophes - mass extinctions where the relative 
populations of many species fall to zero in a short time - which are followed by slow recoveries. 
The asymmetry between catas trophe and recov e ry time s, and fat tails in the size distribution 



of catastrophes in the model Ijain and Krishnal. l2QQ2ah are also observed in catast rophes and 
recoveries seen in the fossil rec o rd IINewman and PalmerlllQQOl ) and in financial markets (Bouchaud 



120011: 
matter 



Johansen and Sornettd. 1200111. The mec h anisrns that cause such mass extinctions are a 



of much debate (|Mavnard-Smith I, liQad: IgiU. I1994I ). The model of iBak and SneppenI 



( 1993 ) is an attempt to explain the frequency distribution of extinction events of different sizes 
using the mechanism of self-organized criticality. The model I will study is similar in some ways to 
the Bak-Sneppen model, with the addition of another dynamical variable - the network structure 
of the system. The advantage of explicitly modeling the network structure is that the mechanisms 
causing the catastrophes and their dependence on the network structure can be analyzed in detail. 

I will show that, in th e model, the largest extinctions are caused mainly by three mechanisms 
( Jain and Krishna. l2002a ). One of the mechanisms involves the removal of specific species. This 
is reminiscent of the notion of 'keystone species' that was discussed in section (LGI Thus, in this 
model, one can give a gr aph-theoretic definition of keystone species and explain why their removal 
causes mass extinctions (|Jain and Krishnal. l2002bl ). 



1.10.1 Innovations 



Apart from the removal of species, it is interesting that 'innovations' - the creation of new structures 
in the network - can also cause catastrophes. In the co ntext of the mode l it is possible t o create 



2002hl. l2003hh. This 



a graph-theoretic definition for the term 'innovation' ijJain and Krishn; 
definition allows me to construct a hierarchy of innovations and correlate the graph-theoretic 
structure of an innovatio n to its short and long term effects on the evolution of the network 



( Jain and Krishna 



2003aB . 
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1.11 A map of subsequent chapters 

Chapters 2-4 describe different segments of the basic model, that are put together in chapter 5, 
and introduce various results from graph theory that will be useful for analyzing the model. The 
later chapters analyze the diverse phenomena observed in the model using the results from the 
previous chapters. In more detail: 

Chapter 2 introduces certain elements of graph theory. I set out the notation that will be used and 
give definitions of directed graphs, degree of a node, paths, cycles and connected components 
of a graph. I introduce the notions of 'dependency' of a node, and 'interdependency' of a 
graph. I define the Perron-Frobenius eigenvalue and eigenvectors of a graph and describe 
some of their properties. Random graphs are briefly discussed at the end of the chapter. 

Chapter 3 introduces the concept of an 'autocatalytic set'. I describe the relation between alge- 
braic properties of a graph, such as the Perron-Frobenius eigenvalue and eigenvectors, and 
topological properties, such as the presence or absence of autocatalytic sets in a graph. I then 
present the eigenvector profile theorem which, for any graph, specifies the 'profile' (which 
components are zero and which non-zero) of all possible Perron-Frobenius eigenvectors of the 
graph. The notion of a core and periphery of a Perron-Frobenius eigenvector is introduced. 

Chapter 4 presents a dynamical system whose variables 'live' on the nodes of the graph. Their 
dynamics is described by a set of coupled differential equations, where the couplings are 
specified by the graph. The equations are derived from an idealized version of the chemical 
rate equations that would describe the dynamics of catalytic molecules in a well-stirred 
chemical reactor. I show that the attractors are fixed points and for generic initial conditions 
are Perron-Frobenius eigenvectors of the graph. I present the attractor profile theorem which 
identifies the subset of Perron-Frobenius eigenvectors that are attractors of the system for a 
given graph. The 'core' and 'periphery' of a graph are defined and the notion of a 'keystone 
node' is introduced. 

Chapter 5 introduces a model of an evolving graph along the lines of section [TtI The model 
uses the dynamical system discussed in chapter 4 for step 1 of the dynamics. Some specific 
rules are chosen for steps 2 and 3 that aim to capture the way storms, floods or tides would 
remove molecular species from, and add new molecular species to, a pool on the prebiotic 
Earth. I display the evolution of various quantities - the total number of links in the graph, 
the number of nodes with non-zero relative population in the attractor, the Perron-Frobenius 
eigenvalue, and the interdependency of the graph - as a function of time for some example 
runs of the model. I also exhibit snapshots of the graph at various times for one run. Three 
'phases' of behaviour are observed in these runs which are dubbed the 'random', 'growth' 
and 'organized' phases. 
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Chapter 6 discusses the random and growth phases. Autocatalytic sets are shown to play an 
important role. I argue that no graph structure is stable for very long during the random 
phase, and that this is because there is no ACS in the graph. I show that the chance 
formation of an ACS triggers the growth phase, in which the number of links of the graph 
grow exponentially as the ACS expands by accreting more and more nodes to itself. This 
continues until the entire graph is an ACS at which point the organized phase starts. The 
timescales of appearance and growth of the ACS are analytically estimated. I show that the 
final fully autocatalytic graph is a highly non-random graph. The degree and dependency 
distributions for the fully autocatalytic graphs produced in the runs are discussed. 

Chapter 7 discusses the sudden extinctions of large numbers of molecular species observed occa- 
sionally in the organized phase. I show that the largest extinction events are 'core-shifts', a 
complete change of the core of the graph. The notion of an 'innovation' is introduced. I 
show that core-shifts can occur due to the removal of a keystone species from the graph, 
or due to a particular type of innovation (the addition of a species which creates a new 
'self-sustaining' structure in the graph), or a specific combination of both. The timescales of 
these large extinction events, as well as the recovery of the system afterward, are discussed. 
The structure of the graphs just before each large extinction, in particular their degree and 
dependency distributions, are analyzed. 

Chapter 8 discusses the robustness of the formation and growth of autocatalytic sets to changes 
in the model rules. I list various simplifications in the model rules that depart from realism but 
make the system analytically tractable. Variants of the model that relax these simplifications 
are presented and are shown to exhibit the formation and growth of autocatalytic sets. 

Chapter 9 summarizes the interesting features of the model and its variants, and discusses the 
limitations and possible extensions of the model. 

Appendix A contains detailed proofs of all propositions made in the thesis, including the eigen- 
vector, and attractor, profile theorems. 

Appendix B describes two computer programs that use different methods to simulate the model 
and its variants. The source code of the programs are provided in the attached CD. 



Appendix C describes the contents of the attached CD. 



Chapter 2 

Definitions and Terminology 



2. 1 Directed graphs and adjacency matrices 



Definition 2.1: Directed graph. 

A directed graph G = G{S,L) is defi ned by a set 5 of 'nodes' and a set L of 'l inks', where 



each link is an ordered pair of nodes ()Hararvl . Il969 



Robinson and Foulds 



The set of nodes can be conveniently labeled by integers, S = {1,2, . . . , s} for a directed graph 
of s nodes. Henceforth I will use the ternn 'graph' to refer to a labeled, directed graph. 

An exannple of a graph is given in Figure lTTb where each node is represented by a small labeled 
circle, and a link is represented by an arrow pointing from node j to node i. A graph with s 
nodes is completely specified by an s x s matrix, C = (cij), called the 'adjacency matrix' of the 
graph, and vice versa. 



Definition 2.2: Adjacency matrix. 

The adjacency matrix of a graph G 



G{S,L) with s nodes is an s x s matrix, denoted 



C 



where Cij = 1 if L contains a directed link (j, i) (arrow pointing from node j to 



node i), and oij = otherwise. 



Robinson and Foulds 



This convention differs from the usual one ([Hararv . 
19981 ) where Cij = 1 if and only if there is a link from node i to node j; the transpose of the 



adjacency matrix defined above. This convention has been chosen because it is more convenient in 
the context of the dynamical system to be discussed in chapter|4| Figure IZlb shows the adjacency 
matrix corresponding to the graph in Figure ITTb . I will use the terms 'graph' and 'adjacency 
matrix' interchangeably: the phrase 'a graph with adjacency matrix C" will often be abbreviated 
to 'a graph C". 

Und i rected graphs are special cases of di rected graphs whose adjacency matrices are symmetric 



( Hararv 



Il969l : 



Robinson and Foulds 



1980 ). A single (undirected) link of an undirected graph 



between, say, nodes j and i, can be viewed as tw o directed links of a directed graph, one from j to 
i and the other from i to j. See lBollobas jl998 ) for many results concerning undirected graphs. 
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Figure 2.1: a. A directed graph with 8 nodes. The unbracketed number adjacent to each node is 
the degree of that node, and the bracketed number is the dependency, b. The adjacency matrix 
of the graph in (a), c. A subgraph of the graph in (a) induced by S' = {3,4,5}. The adjacency 
matrix of the subgraph is the shaded portion of the matrix in (b). d. The condensation of the graph 
in (a). The dashed lines in (b) demarcate the portions that correspond to the strong components 
Ca- The only basic subgraph, C7, is shaded. 



2.2. Degrees and dependency 
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Definition 2.3: Subgraph. 

A graph G' = G'(S',L') is called a subgraph of G{S, L) if 5' C 5 and L' c L 



Hararvl.ll96S 



Bollobas 



m 



Definition 2.4: Induced subgraph. 

A graph G' = G'{S',L') is called an induced subgraph of G{S,L), or the subgraph of 
G{S,L) iriduced by S' if S' C S and cont ains all (and only) those links in L with both 



endpoints in S' ([Hararv 



iBolloba: 



mm 



The graph in Figure \2A} : is thus an induced subgraph of the graph in Figure l2?lb (induced by 
S' = {3,4,5}). For a subgraph it is often more convenient to label the nodes not by integers 
starting from 1, but by the same labels the corresponding nodes had in the parent graph. The 
adjacency matrix of an induced subgraph can be obtained by deleting all the rows and columns 
from the full adjacency matrix that correspond to the nodes outside the subgraph. The highlighted 
portion of the matrix in Figure lTTb is the adjacency matrix of the induced subgraph in Figure lTTb . 



2.2 Degrees and dependency 



2.2. 1 Degree of a node and degree distribution of a grapli 

Definition 2.5: Degree, in-degree and out-degree. 

The degree, or total degree, of a node is the total number of incoming plus outgoing links 
from that node, i.e., the degree of node i is Ylj=ii'^ji + Cji)- 

The in-degree of a node is the total number of incoming links to that node, i.e., the in-degree 
of node i is Y,'j=i '^ij- 

The out-degree of a node is the total number of outgoing lin ks from that node, i.e., the 



out-degree of node ^ is ^ 



j=l Cji 



Hararvl.ll969 : 



Bollobas 



mm 



The unbracketed numbers adjacent to each node in Figure IZTb show the degree of that node. 



Definition 2.6: Degree distribution. 

The degree distribution of a graph, denoted P{k), is the fraction of nodes with degree k. 
The out-degree distribution, Pn ,,i(k), and the in-degree distribution, Pin{k), are similarly 
defined iHararJ. llQfig: Rolloha-J. llQQ3) . 



2.2.2 Dependency and interdependency 

Definition 2.7: Dependency. 

The dependency, denoted di, of a node i is the to tal number of links in all paths that 



terminate at that node, each link counted only once ()Jain and Krishna 
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Definition 2.8: Dependency distribution. 

The dependency distribution of a graph, denoted D{d), is the fraction of nodes with 
dependency d. 

Definition 2.9: Interdependency. 

The interdependency of a graph is defined to be the average dependency: 

d = (i/s)y.L.dj. = y^ _. d X D{d) 



3in and Krishn: 



The bracketed numbers adjacent to each node in Figure IZlfe show the dependency of that node. 
The interdependency of the graph in Figure [2?lb is 9/8. 

Because di counts how nnany links ultimately 'feed into' the node i, it is a measure of how 
'dependent' node i is on other nodes. Thus d is a measure of how interdependent are the nodes 
in the graph. While the degree of a node is a 'local' measure as it depends only on the immediate 
connections of a node, the dependency is more 'non-local' in character because links far away from 
the node contribute to it. 



2.3 Walks, paths and cycles 

Definition 2.10: Walk, closed walk. 

A walk of length n (from node ii to node in+i) is an alternating sequence of nodes and 
links iilii2h ■ ■ ■ in^nin+i such that link h points from node ii to node 12 (i.e. li = (11,12)), 
I2 points from ^2 to i^, and so on. If the first and last nodes i\ and in+i of a walk are the 
same, it will be referred to as a closed walk 



iHararJ. Eh^; IPobinson and Fould-J. IiQRgI: iRolloba-J. \mk 



The existence of even one closed walk in the graph implies the existence of an infinite number of 
distinct walks in the graph. In the graph of Figure IZTb . there are an infinite number of walks from 
node 7 to node 8 (e.g., 7 — > 8, 7 ^ 8 ^ 7 ^ 8, . . .) but no walk from node 7 to node 2. An 
undirected graph trivially has closed walks if it has any undirected links at all. 

If C is the adjacency matrix of a graph then it is easy to see that {C^)ij equals the number of 
distinct walks of length n from node j to node i. E.g., {C'^)ij = X]fc=i CifcCfejl each term in the 
sum is unity if and only if there exists a link from j to k and from k to i, hence the sum counts 
the number of walks from j to i of length 2. 



Definition 2.11: Path. 

A walk v yith a 1 1 nodes distinct is a pa th 



Hararvl . 119691 : iRobinson and Foulds 



Bollobas 



mi) 



In a directed graph C, I will say node j 'has access to' node i, or node i 'has access from' node j, 
if there is a path from node j to node i, i.e., for some n > {C'^)ij ; 



'has access 


from' 


[RothblunJ, 


Il97fih 



2.4. Connected components of a graph 
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to a node i as being 'downstream' fronn a node j if j has access to i, but i does not have access 
to j. Similarly i is 'upstream' from j if i has access to j, but j does not have access to i. Thus 
in Figure l2?lb . node 5 is downstream from node 2, or equivalently node 2 is upstream from node 
5 because node 2 has access to 5 but not vice versa. Node 2 is neither upstream nor downstream 
from node 7 as neither have access to the other. Node 8 is also neither upstream nor downstream 
from 7 because each has access to the other along some directed path. 



Definition 2.12: Cycle, n-cycle. 

An n- c ycle is a closed walk with n links, and all intermediate nodes distinct 
jlHararv 



Robinson and Foulds 



Bollobas 



mm 



I will also use the term 'cycle' to refer to the subgraph consisting of the nodes and links that form 
the cycle. Thus any subgraph with n > 1 nodes that contains exactly n links and also contains a 
closed walk that covers all n nodes is an n-cycle. E.g., the subgraph induced by nodes 7 and 8 in 
Figure lZTb is a 2-cycle. Clearly any graph that has a closed walk contains a cycle. 



2.4 Connected components of a graph) 



Given a directed graph C, its 'associated undirected graph' (or 'symmetrized version') C^'^^ can be 
obtained by adding additional links as follows: for every link in L, add another link if 
the latter is not already in L. 

Definition 2.13: Strongly, weakly and unilaterally connected nodes. 
Two nodes i and j of a directed graph C will be said to be 

(i) strongly connected if i has access to j and j also has access to i, 

(ii) unilaterally connected if either i has access to j, or j has access to to i, 

(iii) weakly connected if there exists a path between them in the associated 
undirected graph C^^\ 



(iv) disconnected if none of the above is true jlHararv . 1969 



Robinson and Foulds 



It is evident that any strongly connected nodes are also unilaterally connected, and any unilat- 
erally connected nodes are weakly connected, but the converse need not be true. A graph will 
be termed strongly, unilaterally, or weakly connected if all pairs of its nodes are, respectively, 
strongly, unilaterally or weakly connected. Any directed graph can be decomposed into strongly-, 
unilaterally- and weakly-connected components which are subgraphs induced by maximal sets of 
strongly, unilaterally and weakly connected nodes (|Hararyl . ll969l:lRobinson and Fouldsl.ll98Ql ) (e.g., 
the graph of Figure lZTfe has five weakly connected components induced, respectively, by the nodes 
{1}, {2, 3, 4, 5}, {6} and {7, 8)1. By convention a single node is con s idered strongly-connected to 



itself even if there is no self-link I^Hararvl . Il969 



Robinson and Foulds 



1980 ). Therefore, if a graph 



contains a single node with no links the single node is also considered a (trivial) strong component 
of the graph. 
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2.5 Partitioning a graphi into its strong components 

The nodes of any graph can be grouped into a unique set of strong components as follows: 

1. Pick any node, say i. Find all the nodes having access from i. Denote this set by Si; it 
may include i itself. Similarly find all the nodes having access to i. Denote this set by 52. 
Denote the subgraph induced by the set of nodes {i} U {Si n as Ci. Ci is one strongly 
connected component of the graph. 

2. Pick another node that is not in Ci and repeat the procedure with that node to get another 
subgraph, C2. The sets of nodes comprising the two subgraphs will be disjoint. 

3. Repeat this process until all nodes have been placed in some Ca, a = 1, 2, ... , M . Each Ca 
is a strong component of the graph. 

Irrespective of which nodes are picked and in which order, this procedure will produce for any graph 
a unique (upto labeling of the Ca) set o f disjoint subgraphs, each of which is stron gly connected. 



encompassing all the nodes of the graph (|Hararyl . ll969l:lRobinson and Fouldg.ll98d ). The graph in 



Figure ITTb will decompose into 7 such subgraphs (comprising nodes {1}, {2}, {3}, {4}, {5}, {6} 
and {7,8}). 

One says there is a path from a strong component Ci to another strong component C2 if there 
is a path in C from any node of Ci to any node of C2. The terms 'access', 'downstream' and 
'upstream' can thus be used unambiguously for the Ca- 

2.6 Condensation of a graph) 

Definition 2.14: Condensation of a graph. 

Determine all the strong components Ci, C2, . . . , Cm of the graph as described above. Con- 
struct a new graph of M nodes, one node for each Ca, a = 1, . . . , M . The new graph has a 
directed link from Cp to Ca if, in the original graph, any node of Cp has a link to any node 
of C„. This n e w graph is called the conde nsation of the graph C 



l|HararJ.ll96g:lRobinson and Foulds 



Figure IZTU illustrates the condensation of Figure IZTfe . Clearly the condensed graph cannot have 
any closed walks. For if it were to have a closed walk then the Ca subgraphs comprising the closed 
walk would together have formed a larger strong component in the first place. Therefore one can 
renumber the Ca such that if a > /3, Cp is never downstream from Ca- Now one can renumber 
the nodes of the original graph such that nodes belonging to a given Ca are labeled by contiguous 
node numbers, and whenever a pair of nodes i and j belong to different subgraphs Ca and Cp 
respectively, then a > (3 implies i > j- Such a renumbering is in general not unique, but with any 
such renumbering the adjacency matrix takes the following canonical form: 



2.7. Irreducible graphs and matrices 
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( c. 



Co 



c 



\ 



V 



R 



Cm ) 



where indicates that the upper block triangular part of the matrix contains only zeroes while the 
lower block triangular part, R, is not equal to zero in general. The matrix in Figure IZlb is already 
in this canonical form; the dashed lines demarcate the portions that correspond to the Cq. 

2.7 Irreducible graphs and matrices 

Definition 2.15: Irreducible graph. 

A graph is te r med irreducib le if each node in the graph has access to every other node 
iHararJ. Eh^; Isenetal. llQT^l. 



The simplest irreducible subgraph is a 1-cycle. In Figure IZTb the subgraph induced by nodes 7 
and 8 is irreducible, but the subgraph induced by nodes 2, 3, 4, and 5 is not irreducible as there 
is, for example, no path from node 5 to node 2. 

Definition 2.16: Irreducible matrix. 

A matrix C is irreducible if for every ordered pair of nodes i and j there exists a positive 



integer k such that {C^)ij > 



jsenetal. 



mi) 



Thus, if a graph is irreducible then its adjacency matrix is also irreducible, and vice versa. Irreducible 
graphs are, by definition, strongly connected graphs. In fact all strongly connected graphs are 
irreducible, except the graph consisting of a single node with no self-link which is strongly connected 
(by definition) but not irreducible. 



2.8 Perron-Frobenius eigenvectors (PFEs) 

Definition 2.17: Eigenvector, eigenvalue. 

A column vector x = {xi,X2, ■ ■ ■ ,Xs)'^ is said to be a right eigenvector of an s x s matrix 
C with an eigenvalue A if for each i, Ylj=i^ij^j — ^^i- The eigenvalues of a matrix C 
are roots of the characteristic equation of the matrix: \C — \I\ = 0, where / is the identity 
matrix of the same dimensionality as C and \A\ denotes the determinant of the matrix A 



jseneta 



mm 
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A 'left eigenvector' is similarly defined to be a row vector x = {xi,X2, ■ ■ ■ ,Xs) such that for each 
X]j=i '^ji^j — ^^i'' the left eigenvectors of C are simply the transpose of the right eigenvectors 
of C^, the transpose of C. In this thesis I will only use right eigenvectors and will therefore 
refer to them simply as eigenvectors. Note that the set of eigenvalues corresponding to all right 
eigenvectors is identical to the set of eigenvalues corresponding to the left eigenvectors. In general 
a matrix will have complex eigenvalues and eigenvectors, but an adjacency matrix of a graph has 
special properties, because it is a 'non-negative' matrix, i.e., it has no negative entries. 



2.8.1 The Perron-Frobenius theorem 

The Perron-Frobenius theorem for irreducible non-negative matrices states jseneta . Il973l ): 
Suppose T is an s X s non-negative irreducible matrix. Then there exists an eigenvalue r such 
that: 

a) r is real, > 0; 

b) with r can be associated strictly positive left and right eigenvectors; 

c) r > |A| for any eigenvalue A 7^ r; 

d) the eigenvectors associated with r are unique to constant multiples; 

e) Let B be any s x s non-negative matrix. If < i? < T and /? is an eigenvalue of B, then 
|/3| < r. Moreover, = r implies B = T; 

f) r is a simple root of the characteristic equation of T. 



If the matrix is not irreducible a weaker form of the above the orem holds. The Perron-Frobenius 
theorem for a general non-negative matrix states ( Senetal . 1973 ): 

Suppose T is an s X s non-negative matrix. Then there exists an eigenvalue r such that: 
a') r is real, > 0; 

b') with r can be associated non-negative left and right eigenvectors; 

c') r > |A| for any eigenvalue A 7^ r; 

e') Let B be any sxs non-negative matrix. \iO < B <T and /3 is an eigenvalue of then < r. 



Definition 2.18: Perron-Frobenius eigenvalue. 

For any graph C, the eigenvalue of C that is real and larger than or equal to all other 
eigenvalues in magnitude will be called the Perron-Frobenius eigenvalue of the graph and 
denoted by Ai(C). The Perron-Frobenius theorem guarantees the existence of such an 
eigenvalue. 

Definition 2.19: Perron-Frobenius eigenvector (PFE). 

For any graph C, each eigenvector corresponding to Ai(C) consisting only of real and non- 
negative components will be referred to as Perron-Frobenius eigenvector (PFE). The Perron- 
Frobenius theorem guarantees the existence of at least one such eigenvector. 



2.8. Perron-Frobenius eigenvectors (PFEs) 
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The Perron-Frobenius eigenvalue of the graph in Figure lZTb is 1 and x = (0, 0, 0, 0, 0, 0, 1, 1)^ is 
a Perron-Frobenius eigenvector. 

The presence or absence of closed walks in a graph can be determined from the Perron-Frobenius 
eigenvalue of its adjacency matrix: 

Proposition 2.1: If a graph, C, 

(i) has no closed walk then Ai(C) = 0, 

(ii) has a closed walk then Ai(C) > 1. 

The proof of this and all subsequent propositions can be found in appendix A. 

Note that Ai cannot take values between zero and one because of the discreteness of the entries 
of C, which are either zero or unity. 



2.8.2 Basic subgraphs 

In section it was shown that the adjacency matrix of any graph can always be written in the 
following canonical form by an appropriate renumbering of the nodes: 



c 



Co 



From the above form of C it follows that 



\ 



Cm J 



\C - \I\ = \Ci - XI\ X \C2- \I\ X ... X \Cm - A/|. 

Therefore the set of eigenvalues of C is the union of the sets of eigenvalues of Ci, . . . , Cm, which 
implies that Ai(C) = maxQ,{Ai(CQ,)}. Thus, if a given graph C has a Perron-Frobenius eigenvalue 
Ai then it contains at least one strong component with Perron-Frobenius eigenvalue Ai. 

Definition 2.20: Basic subgraph. 

Each strong component s of C with Perr on-Frobenius eigenvalue equal to Ai(C) is referred 
to as a basic subgraph (|Rothblum . 1975 ). 



The shaded node in Figure l2?ll j corresponds to the only basic subgraph of Figure lZT 



Proposition 2.2: If all the basic subgraphs of a graph C are cycles then Ai(C) = 1, and vice 
versa. 
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Figure 2.2: Dependency distribution of nodes of 10^ random graphs from the ensemble G§ with 
s = 100, p = 0.0025. The average dependency is 0.33 and the standard deviation of dependency 
values is 0.76. The straight line plots the function Aex.p{—ad) with A = 0.25, a = —0.87. This 
is the best straight line fit to the dependency distribution on a semi-log plot. 



2.9 Random graphs 



A random graph is an ens emble of graphs vy i th eac h member of the ensemble having an associated 
probability (introduced bv lErdos and Renvil. Il959l ). Two ensembles have been extensively studied. 



One, denoted G[, is the ensemble of graphs with s vertices and / links (and no self-links) with 
a uniform associated probability. The second is the ensemble of graphs, denoted Gf , having s 
vertices, with each possible link (except self-links) present with probability p; in other words the 
ensemble of all graphs with the probability associated with a graph being — py^'^~^^~^ if it 
has / links. These two ensembles have the same properties in the s — > oo limit if / = ps{s — 1) 
([Bollobasl. Il99£ . 2001 ). For the random graph Gf the average number of links is ps(s — 1) whi le 
the (total) degree distribution is binomial: P{k) = '^'-'^CkP^{l-p)'^'~'^~^ f BQl lQba<J.llQQ3.l200lh . 
Figu re IZ2l shows the dependency distribution of nodes of 10'' graphs picked from the ensemble Gf 
with s = 100, p = 0.0025 (the program rndgraph.cpp that is included in the attached CD, see 
appendix C, was used to generate the random graphs). The dependency averaged over all nodes 
was 0.33, and the standard deviation of dependency values was 0.76. The distribution appears to 
decline exponentially as a function of dependency value but much slow e r than would be expected 
for a Poisson distribution with the same mean. See jBollobaj . llQQsl. 2QQ1 ) for further results 
concerning random graphs. 



Chapter 3 

Autocatalytic Sets 



This chapter discusses the notion of an 'autocatalytic set', which will play an important role in the 
dynamics of the model studied in this thesis. After first providing a graph-theoretic definition of an 
autocatalytic set, I discuss its relationship with the Perron-Frobenius eigenvectors of a graph. The 
rest of the chapter analyzes the structure of PFEs of different graphs, both ones which contains 
autocatalytic sets and ones which do not. The results derived here will prove useful in the analysis 
of the dynamical system discussed in chapterlH 



3. 1 Autocatalytic sets (ACSs) 



Definition 3.1: Autocatalytic set (ACS). 

An autocatalytic set is a graph, ea ch of whose nodes has a t least one incoming link from a 



node belonging to the same graph l|jain and Krishna. llQQa ) 



The concept of an ACS was introduced in the context of a set of catalytically interacting molecules 
where it was defined to be a set of mo l ecular species that conta i ns, vy ithin itself, a catalyst for 
each of its member species ijEi genl. [l971 : KaufFman . IQyJjRosslelllQ?]] ). Such a set of molecular 
species can collectively self-replicate under certain circumstances even if none of its component 
molecular species can individually self-replicate. This definition matches the one above if you 
imagine a node in a directed graph to represent a molecular species and a link from j to i to signify 
that j is a catalyst for i. 

Figure im shows various ACSs. The simplest is a 1-cycle; Figure ITlb . Figures ITlb and 13.1b 
are graphs that are irreducible as well as cycles. ITTb is an ACS that is not an irreducible graph 
and hence not a cycle, while l3.lH and 13.1b are irreducible graphs that are not cycles. 



Proposition 3.1: (i) All cycles are irreducible graphs and all irreducible graphs are ACSs. 
(ii) Not all ACSs are irreducible graphs and not all irreducible graphs are cycles. 
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Figure 3.1: Examples of autocatalytic sets (ACSs). a. A 1-cycle, the simplest ACS. b. A 2-cycle. 
c. An ACS that is not an irreducible graph. d,e Examples of ACSs that are irreducible graphs but 
not cycles. 



3.2 Relationship between Perron-Frobenius eigenvectors and 
autocatalytic sets 

The ACS is a useful graph-theoretic construct in part because of its connection with the PFE. 
Firstly, the Perron-Frobenius eigen value of a graph is an indicator of the existence of an ACS in 
the graph jjain and Krishna . 199g. 1999 ): 



Proposition 3.2: (i) An ACS must contain a closed walk. Consequently, 

(ii) If a graph C has no ACS then Ai(C) = 0. 

(iii) If a graph C has an ACS then Ai(C) > 1. 

Let X be a PFE of a graph. Consider the set of all nodes i for which Xi is non-zero. I will call the 
subgraph induced by these nodes the 'subgraph of the PFE x'. For example consider the graph in 
Figure lT^ . For this graph Ai = 1. Figure IT^ shows a PFE of the graph and how it satisfies the 
eigenvalue equation. For this PFE, nodes 1, 5 and 6 have Xi = 0. Removing these nodes produces 
the PFE subgraph shown in Figure IT^ . If all the componen ts of the PFE are non-zero the n the 
subgraph of the PFE is the entire graph. One can show that ijjain and Krishnal. ll9QRl. ll9QQh: 



Proposition 3.3: If Ai(C) > 1, then the subgraph of any PFE of C is an ACS. 

For the PFE of Figure IT^ this is immediately verified by inspection. Note that this result relates 
an algebraic property of a graph, its PFE, to a topological structure, an ACS. Further, this result 
is not true if we considered irreducible graphs instead of ACSs. 

Proposition 3.4: Let x be a PFE of a graph C, and let C denote the adjacency matrix of the 
subgraph of x. Let Ai(C") denote the Perron-Frobenius eigenvalue of C. Then Ai(C") = 
Ai(C) and C must contain at least one of the basic subgraphs of C. 



Figure IT2l illustrates this point. The adjacency matrix of the PFE subgraph, C", is obtained by 
removing rows 1, 5, 6 and columns 1, 5, 6 from the original matrix. Figure IT2l j illustrates that 



3.3. Eigenvector profile theorem 
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Figure 3.2: a. A graph with 6 nodes, b. x is an eigenvector of its adjacency matrix C with eigen- 
value Ai = 1, which is the Perron-Frobenius eigenvalue of the graph. The non-zero components of 
X and the corresponding rows and columns of C are shaded, c. The subgraph of the PFE x. The 
subgraph is an ACS and contains one of the basic subgraphs of the graph in (a), d. The vector 
x' constructed by removing the zero components of x is an eigenvector of the adjacency matrix, 
C , of the PFE subgraph. Its corresponding eigenvalue is unity, which is also the Perron-Frobenius 
eigenvalue of the PFE subgraph. 



the vector constructed by removing the zero components of the PFE is an eigenvector of C with 
eigenvalue 1. C contains one basic subgraph of C - the one induced by the nodes 2 and 3. 

Definition 3.2: Simple PFE. 

If the subgraph of a PFE conta ins only one basic subgraph then the PFE is termed a simple 
PFE jjain and Krishna . 2003a ). 



The PFE in Figure IT2l is simple. 

3.3 Eigenvector profile tlieorem 

I now present a series of propositions that describe some aspects of the 'profile' of possible PFEs of 
a graph, i.e., which components of the PFE are zero and which are non-zero. These propositions 
lead toward the eigenvector profile theorem which provides an algorithm for finding the profile 
of all the simple PFEs of a graph, and consequently the profile of the possible non-simple PFEs 
also. This theorem is completely general, applying to any given graph. After I had generated 
the proofs for the theo rem and the prop ositions it depends upon, I found that these results had 



already been derived by lRothbluml l|l975l ) as part of a more general theorem concerning the profile 



of generalized eigenvectors of a non-negative matrix. The proofs provided in appendix A (Krishna 
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and Jain, forthconning) are, however, different from his; they nnake no use of the principle of 
mathematical induction, which Rothblum uses extensively. 

For convenience I will often abbreviate statements like 'there exists a PFE in which the com- 
ponents corresponding to the nodes in subgraph Ciare non-zero/zero' to 'there exists a PFE in 
which the nodes in subgraph Ci are non-zero/zero'. 

Proposition 3.5: If a node is non-zero in a PFE then all nodes it has access to are non-zero in 
that PFE. 

It follows that if any node of an irreducible subgraph is non-zero in a PFE then all the other nodes 
in that subgraph must be non-zero in that PFE. A situation where some nodes of an irreducible 
subgraph are non-zero and some are zero in the same PFE cannot happen. 

Proposition 3.6: If a node is upstream from a basic subgraph then that node is zero in any PFE. 

Proposition 3.7: If a node does not have access from a basic subgraph then it is zero in any PFE. 

Propositions 3.6 and 3.7 describe which nodes will be zero in the PFEs, while proposition 3.4 and 
3.5 describe which nodes will be non-zero - at least one of the basic subgraphs will be non-zero in 
every PFE along with all nodes it has access to. In other words the subgraph of any PFE consists of 
one or more of the basic subgraphs and those nodes that they have access to, i.e., the subgraph of 
any PFE is necessarily an ACS (recall proposition 3.3, this is an alternate proof of that proposition). 

Propositions 3.4-3.7 can be used to prove the following theorem about the profile of PFEs of 
any graph: 

Theorem 3.1: Eigenvector profile theorem 

For any graph C, determine all the basic subgraphs of C. Denote them by Di, . . . , Dk ■ 
Determine which of these does not have any other Di downstream from it. Denote these by 
El, ... , En- Then: 

(i) For each i = \, . . . ,N there exists a unique (upto constant multiples) PFE in which the 
nodes of Ei and all nodes having access from them are non-zero and all other nodes are 
zero. It is evident that these PFEs are simple. Moreover these are the only simple PFEs of 
the graph C . 

(ii) Any PFE is a linear combination of these simple PFEs. 



3.4 Core and periphery of a simple PFE 



Definition 3.3: Core and periphery of a simple PFE. 

If C is the subgraph of a simple PFE, the basic subgraph contained in C will be called the 
core of the simple PFE (or equivalently, the 'core of C), and denoted Q. The set of the 
remaining nodes and links of C that are not in its core w ill together be said to constitute 



the periphery of the simple PFE ()Jain and Krishna . l2Q03a 



3.5. Core and periphery of a non-simple PFE 
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Figure 3.3: Ai is a measure of the multiplicity of internal pathways in the core of a simple PFE. 
Four irreducible graphs are shown. An irreducible graph always has a unique PFE that is simple 
and whose core is the entire graph. The Perron-Frobenius theorem ensures that adding a link to 
the core of a simple PFE necessarily increases its Perron-Frobenius eigenvalue Ai. 



For example, for the PFE in Figure IT^ the core is the 2-cycle comprising nodes 2 and 3. In 
Figure IT2l the core nodes are black and the periphery nodes grey. Note that the periphery is not a 
subgraph because it contains links not just between periphery nodes but also from nodes outside 
the periphery (like the link from node 3 to 4 in Figure IT2b ). 

It follows from the Perron-Frobenius theorem for irreducible graphs that Xi{Q) will necessarily 
increase if any link is added to the core. Similarly removing any link will decrease Xi{Q). Thus Ai 
measures the multiplicity of internal pathways in the core. Figure [T3l illustrates t his point. 

Th e core and periphery can be shown to have the following topological property ( Jain and Krishnal. 

2nn3ah: 



Proposition 3.8: Every node in the core of a simple PFE has access to every other node of the 
PFE subgraph. No periphery node has access to any core node. 

Thus starting from the core one can reach the periphery but not vice versa. 

3.5 Core and periphery of a non-simple PFE 

Because any PFE of a graph can be written as a linear combination of the simple PFEs (which, 
upto constant multiples, are unique for any graph), the definitions of core and periphery can be 
readily extended to any PFE as follows: 

Definition 3.4: Core and periphery of a (non-simple) PFE. 

The core of a PFE, denoted Q, is the union of the cores of those simple PFEs whose linear 
combination forms the given PFE. The rest of t he nodes and links of the PFE subgraph 
constitute its periphery ( Jain and Krishnal. 2QQ3a ). 



It follows from the above discussion that Xi{Q) = Ai(C). When the core is a union of disjoint 
cycles then Xi{Q) = 1, and vice versa (recall proposition 2.2). 
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xi=(l,0,0,0,0,0,0)'" 

X2=(0,0,0, 1,0,0,0)^ 
X3=(0,0,0,0,0,l,0)^ 
X4=(0,0,0,0,0,0,l)^ 



Figure 3.4: A graph that has no ACS. The graph has four simple PFEs (upto constant multiples). 



® 




x=(0,0,0,0,l,l,l//3 

Figure 3.5: A graph that has an ACS, but only one basic subgraph, along with various non-ACS 
structures. The only simple PFE of the graph is also displayed. 



x=(0,0,l,l//2 



Figure 3.6: One 2-cycle downstream from another. Both 2-cycles are basic subgraphs. The graph 
has one simple PFE in which only the downstream cycle is non-zero. 



3.6. The profile of PFEs when there is no ACS 
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3.6 The profile of PFEs when there is no ACS 

If Ai(C) = 0, the graph has no ACS, then the profile of PFEs is as follows: for every node that has 
no outgoing links, there is a PFE with that node non-zero and all other nodes zero. For example 
the graph in Figure IT4l has four (upto constant multiples) simple PFEs that are displayed in the 
same figure. This follows from theorem 3.1 because for a graph with Ai = each node is a basic 
subgraph. A general PFE is a linear combination of all such simple PFEs. Because there is no 
closed walk there is no core (or periphery) for any PFE of the graph. The core of all PFEs of such 
a graph may be defined to be the null set, Q = ^. 

3.7 The profile of PFEs when there is an ACS but only one basic 
subgraph 

Now consider a graph that has Ai > 1, i.e., it contains an ACS, but has only one basic subgraph. 
In addition to the ACS it could possibly have various non-ACS structures such as chains, trees and 
isolated nodes. An example is the graph in Figure ITsl For such a graph theorem 3.1 says that 
there is only one simple PFE in which the nodes in the basic subgraph and all nodes having access 
from it are non-zero, while the other nodes are zero (displayed in Figure ITs]) . In particular the 
nodes in all non-ACS structures are zero in the (unique) PFE of the graph because they are don't 
have access from the basic subgraph. The core nodes (nodes of the single basic subgraph) are 
black and the periphery nodes are grey in Figure ITsl The white nodes are the nodes that are zero 
in the PFE of the graph. 

3.8 The profile of PFEs when there is an ACS and many basic 
subgraphs 

For any graph containing an ACS, i.e., one that has Ai > 1, there are possibly several simple 
PFEs as specified by theorem 3.1. However it follows from the theorem that in all the simple PFEs 
the nodes of non-ACS structures will be zero because they are not downstream from any basic 
subgraph. 

While the nodes of non-ACS structures are guaranteed to be zero, it can also happen that the 
nodes of some basic subgraphs are zero in every PFE. Figure illustrates such a case where the 
graph consists of one 2-cycle downstream from another. The Perron-Frobenius eigenvalue of this 
graph is Ai = 1, and each 2-cycle is a basic subgraph. From theorem 3.1 it follows that this graph 
has only one (simple) PFE (displayed in Figure [T6]l in which the upstream 2-cycle is zero. 
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Chapter 4 

Population Dynamics 



4. 1 The population dynamics equation 

In this chapter, I will analyze the attractors of the dynamical system described by the equation: 

s s 

Xi = ^ ^ ^ij'^j "^i ^ ^ (^kj'^j) (^■■'-) 
j=l k,j=l 

where x lies on the simplex J = {x = {xi,X2, • • • , Xs)'^ G R*|0 < < 1, Yli=i — 1} of 
normalized non-negative vectors in s dimensions. 

As discussed in section ITtI the set of variables Xi can be thought of as 'living' on the nodes 
of a graph whose adjacency matrix is C = (cy). The links of the graph represent the interactions 
between the variables Xj. It is useful to see how equation l|4.ip arises in a chemical context. 

Let i G {1, . . . , s} denote a chemical (or molecular) species in a well-stirred chemical reactor. 
Molecules can react with one another in various ways; I focus on only one aspect of their inter- 
actions: catalysis. The catalytic interactions can be described by a directed graph with s nodes. 
The nodes represent the s species and the existence of a link from node j to node i means that 
species j is a catalyst for the production of species i. In terms of the adjacency matrix, C = (cij) 
of this graph, cij is set to unity if j is a catalyst of i and is set to zero otherwise. The operational 
meaning of catalysis is as follows: 

Each species i will have an associated non-negative population m in the reactor that changes 
with time. Let species j catalyze the ligation of reactants A and B to form the species i, A+B i. 
Assuming that the rate of this c atalyzed re a ction is given by the Michaelis-Menten theory of enzyme 



catalysis, iji = ymaxdbj^^^ (|Gutfreundl . 119951 ). where a,b are the reactant concentrations, and 



Vmax and Km are constants that characterize the reaction. If the Michaelis constant Km is 
very large this can be approximated as iji oc yjab. Combining the rates of the spontaneous and 
catalyzed reactions and also putting in a dilution flux (p, the rate of growth of species i is given by 
i/i = k(l + h'yj)ab — (pyi, where k is the rate constant for the spontaneous reaction, and u is the 
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catalytic efficiency. Assuming the catalyzed reaction is much faster than the spontaneous reaction, 
and that the concentrations of the reactants are non-zero and fixed, the rate equation becomes 
yi = Kyj — (pyi, where K is a constant. In general because species i can have multiple catalysts, 
Vi — X]j=i ^ijUj ~ 4'yi< with Kij ~ Cij. I make the further idealization Kij = Cij giving: 

s 

yi = ^Cijyj - (t)yi. (4.2) 
i=i 

The relative population of species i is by definition Xi = J/i/ X]j=i Vj- As < Xi < 1, Yli=i — 
1, X = (xi, . . . G J. Taking the time derivative of Xi and using ()4.2p it is easy to see that 

Xi is given by ()4.H1 . Note that the (j) term, present in ()4.2jl . cancels out and is absent in l|4.1|l . 

Because we ultimately ignore the concentrations of the reactants we would get the same rate 
equations even if the reaction scheme were different, e.g., if only one reactant was required as in 
the reaction A ^ i. Such dynamics might also be relevant in an economic context where the 
presence of one commodity increases the rate of produc tion of another commodity. 

Equation ()4.1jl has been used by Eigen et al. l|l989 ^ to describe the dynamics of the relative 



populations of self-replicating strings. Each node of C represents one string. Two strings are 
connected by an undirected link if mutations of one string can produce the other; the strength 
of the link being a decreasing function of the number of mutations required. Eigen et al. have 
determined the attractor of equation (|4.1j) for the specific form this interpretation imposes on C . 
Below, I analyze the attractors of equation ()4.ip for an arbitrary graph C, with no restrictions 
placed on its structure. 

4.2 Attractors of thie population dynamics equation 

An important motivation for the choice of the population dv namics 114.11) is it's analyt ical tractabil- 
ity. The following properties of the attractor can be derived (|Jain and Krishna!. l2003al: Krishna and 
Jain, forthcoming): 

Proposition 4.1: For any graph C, 

(i) Every eigenvector of C that belongs to the simplex J is a fixed point of equation ()4.ip . 
and vice versa. 

(ii) Starting from any initial condition in the simplex J, the trajectory converges to some 
fixed point (generically denoted X) in J. 

(iii) For generic initial conditions in J, X is a Perron-Frobenius eigenvector (PFE) of C . 

(iv) If C has a unique (upto constant multiples) PFE, it is the unique stable attractor of 
equation l|4.1|l . 



4.2. Attractors of the population dynamics equation 
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(v) If C has more than one linearly independent PFE, then X can depend upon the initial 
conditions. The set of allowed X is a linear combination of a subset of the PFEs. The interior 
of this set in J may then be said to be the 'attractor' of l|4.1j) . in the sense that for generic 
initial conditions all trajectories converge to a point in this set. 

Statement (i) is easy to see: let x'^ G J be an eigenvector of C, Y^jCijxj = \x}. Substituting 
this on the r.h.s. of 1)4. one gets zero. Conversely, if the r.h.s. of 1)4. l|l is zero, x = x'^, with 

Appendix A provides proof of the above propositions but they can be motivated by considering 
the underlying dynamics ()4.2jl from which 1)4. ip is derived. Because ()4.1jl is independent of 4>, we 
can set = in ()4.2p without any loss of generality. With = the general solution of ()4.2jl . 
which is a linear system, can be schematically written as: 

y(t) = e^*y(0), 

where y(0) and y(i) are viewed as column vectors. Suppose y(0) is a right eigenvector of C with 
eigenvalue A, denoted y'^. Then 

y(t) = e^'y\ 

As this time dependence is merely a rescaling of the eigenvector, this is an alternative way of 
showing that x'^ = y^/X]j=iyj' a fixed point of ()4.1jl . If the eigenvectors of C form a basis 
in W , y(0) is a linear combination: y(0) = X];^aAy'^- In that case, for large t it is clear that the 
term with the largest value of Re(A) will grow fastest, hence, 

y(t) *^°°e^V\ 

where Ai is the eigenvalue of C with the largest real part (which is the same as its Perron- 
Frobenius eigenvalue) and y'^^ an associated eigenvector. Therefore, for generic initial conditions 
the trajectory of ()4.ip will converge to X = x^^ a PFE of C. If the eigenvectors of C do not form 
a basis in i?*, the above result is still true (see appendix A). 

Note that Ai can be interpreted as the 'population growth rate' at large t, because y(t) 
Aiy. In section IT4l I had mentioned that Ai measures a topological property of the graph, the 
multiplicity of internal pathways in the core of the graph. Thus, in the present model, Ai has both 
a topological and dynamical significance, which relates two distinct properties of the system, one 
structural (multiplicity of pathways in the core of the graph), and the other dynamical (population 
growth rate). The higher the multiplicity of pathways in the core, the greater is the population 
growth rate of the dominant ACS. 
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4.3 Attractor profile theorem 

While any eigenvector of C is a fixed point it need not be stable. In other words, only for special 
initial conditions, forming a space of measure zero in J, can X be some other eigenvector of C, not 
a PFE. Henceforth, I ignore such special initial conditions. However, not all PFEs are attractors 
either. The following theorem indicates which subset of PFEs form the attractor set (Krishna and 
Jain, forthcoming): 

Theorem 4.1: Attractor profile theorem 

1) Determine all the strong components of the given graph C. Denote these by Ci, . . . , Cm- 

2) Determine which of these are basic subgraphs. Denote them by Di, . . . , Dk- 

3) Construct a graph, denoted D* , with K nodes, representing the Di, and a link from node 
j to node i if there is a path from any node of Dj to any node of Di that does not contain 
a node of any other basic subgraph. 

4) Determine which of the Di are at the ends of the longest paths in the above graph D* . 
Denote these hy Fi,i = \, ... ,N . 

For each i = 1,...,A^ there exists a unique (upto constant multiples) PFE that has only 
nodes of Fj and all nodes having access from them non-zero, and all other nodes zero. The 
attractor set consists of all linear combinations of these PFEs that lie on the simplex J. 



4.4 The attractor for a graph with no ACS 



For any graph that consists of only c hains, trees and isolated nodes , i.e. any graph with Ai 
the theorem reduces to the following ()Jain and Krishna!. Il998l. 11999 ): 



0, 



Proposition 4.2: For any graph with Ai(C) = 0, in the attractor only the nodes at the ends of 
the longest paths are non-zero. All other nodes are zero. 

This follows from theorem 4.1 because for a graph with Ai = each node is a basic subgraph. 
Appendix A has an alternate proof based directly on the underlying yi dynamics. Figure l4~l] shows 
a graph that has Ai = 0. The white nodes are those that are zero in the attractor and the grey 
node is the only non-zero node in the attractor. This graph is the same as the one in Figure lT4l 
Notice that of the four PFEs of the graph only one, (0, 0, 0, 1, 0, 0, 0)-^ , is the attractor. 



4.5 The dominant ACS of a graph 



For graphs with an ACS, i.e., with Ai > 1, recall that proposition 3.3 showed a close relationship 
between the PFEs of the graph and ACSs. A similar result holds for the attractor (IJain and Krishn; 



Il99a 



mi): 
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Figure 4.1: A graph that has no ACS. Only node 4, coloured grey, is non-zero in the attractor. 



Proposition 4.3: For any graph C, 

(i) For every X belonging to the attractor set, the set of nodes i for which Xj > is the 
same and is uniquely determined by C. The subgraph formed by this set of nodes will be 
called the 'subgraph of the attractor' of 1)4.111 for the graph C. 

(ii) If Ai(C) > 1, the subgraph of the attractor is an ACS. 

Definition 4.1: Dominant ACS. 

For any graph with Ai > 1 th e subgraph of the attractor, w hich is an ACS, will be called the 



dominant ACS of the graph jjain and Krishna . 1998 . 1999 ) 



The dominant ACS is independent of (generic) initial conditions and is the subgraph induced by 
the set of nodes belonging to all the Fi and all nodes having access from them in C. The following 
sections use theorem 4.1 to determine the structure of the dominant ACS of several different types 
of graphs. 



4.6 Examples of the attractor for specific graphis 

It is instructive to consider examples of graphs and see how the trajectory converges to a PFE, and 
what the dominant ACS looks like in each case. 



Example 1. A simple chain, Figure lOb : 

The adjacency matrix of this graph has all eigenvalues zero; Ai = 0. The unique PFE (upto con- 
stant multiples) of this graph is e = (0,0,1)-^. Because node 1 has no catalyst, its rate equation 
is (henceforth taking cp = 0) yi = 0. Therefore yi{t) = yi(0), which is a constant independent 
of t. The rate equation for node 2 is 1/2 = yi = yi(0). Thus y2{t) = 2/2(0) + j/i(0)t. Similarly 
m = y2 implies that ysit) = {l/2)yi{0)f + y2(0)t + ^3(0). At large t, yi = constant, y2 ~ t, 
ys ~ t^; hence 2/3 dominates. Therefore, Xi = limt^oo Xi{t) is given by Xi = 0, X2 = 0,^3 = 1. 
Thus X equals the unique PFE e, independent of initial conditions. 
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Figure 4.2: Examples of graphs with a unique PFE (upto constant multiples), which is, therefore, 
the unique stable attractor. The coloured nodes show the subgraph of the PFE, i.e., the nodes 
that are non-zero in the attractor. The black nodes are core nodes, the grey nodes are periphery 
nodes and the white nodes are the nodes that are zero in the attractor. 



4.6. Examples of the attractor for specific groplns 
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Example 2. A 1-cycle, Figure lOb : 

This graph has two eigenvalues, Ai = 1, A2 = 0. The unique PFE is e = (1,0)^. The rate 
equations are yi = 2/1,2/2 = 0, with the solutions yi(t) = yi{0)e^ ,y2{t) = ^2(0). At large t node 
1 dominates, hence X = (1,0)^ = e. The exponentially growing population of 1 is a consequence 
of the fact that 1 is a self-replicator, as embodied in the equation yi = yi. 

Example 3. A 2-cycle, Figure 14?^ : 

The corresponding adjacency matrix has eigenvalues Ai = 1,A2 = —1. The unique normalized 
PFE is e = (1,1)^/2. The population dynamics equations are yi = 2/2,2/2 = 2/i- 

The general solution to these is (note y\ = yi) 

2/1 (t) = Ae* + Be-\ 2/2(0 = Ae^ - Be'K 

Therefore at large t, 2/1 j4e*, 2/2 — > ^e*, hence X = (1, 1)^/2 = e. Neither 1 nor 2 is individually 
a self-replicating node, but collectively they function as a self-replicating entity. 

Example 4. A 2-cycle with a periphery. Figure lOH : 

This graph has Ai = 1 and a unique normalized PFE e = (1, 1, 1)^/3. The population equations 
for 2/1 and 2/2 and consequently their general solutions are the same as Example 3, but now in 
addition 2/3 = 2/2, yielding yz{t) = Ae^ + Be~* + constant. Again for large t, 2/1,2/2,2/3 grow as 
~ Ae^, hence X = (1, 1, l)"^/3 = e. The dominant ACS includes all the three nodes. 

This example shows how a parasitic periphery (which does not feed back into the core) is 
supported by an autocatalytic core. This is also an example of the following general result: when 
a subgraph C , with largest eigenvalue A'^, is downstream from another subgraph C" with largest 
eigenvalue X" > A'^, then the populations of the former also grow at the rate at which the popu- 
lations of C" are growing. Therefore if C" is non-zero in the attractor, so is C . In this example 
C is the single node 3 with X[ = and C" is the 2-cycle of nodes 1 and 2 with A'/ = 1. 

Example 5. A 2-cycle and a chain. Figure I4r2fe : 

The graph in Figure 14?^ combines the graphs of Figures 14?^ and c. Following the analysis of 
those two examples it is evident that for large t, 2/1 ~ i°,2/2 ~ ^^,2/3 ~ f^-.Vi ~ e*,2/5 ~ e*. 
Because the populations of the 2-cycle are growing exponentially they will eventually completely 
overshadow the populations of the chain which are growing only as powers of t. Therefore the 
attractor will be X = (0, 0, 0, 1, l)'^/2 which, it can be verified, is a PFE of the graph (it is an 
eigenvector with eigenvalue 1). In general when a graph consists of one or more ACSs and other 
nodes that are not part of any ACS, the populations of the ACS nodes grow exponentially while 
the populations of the latter nodes grow at best as powers of t. Hence ACSs always outperform 
non-ACS structures in the population dynamics (see also Example 2). 
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Example 6. A 2-cycle and another irreducible graph disconnected from it, Figure 14^ : 
One can ask, when there is more than one ACS in the graph which is the dominant ACS? Figure|EF 
shows a graph containing two strong components. The 2-cycle subgraph has a Perron-Frobenius 
eigenvalue 1, while the other strong component has a Perron-Frobenius eigenvalue \/2. The unique 
PFE of the entire graph is e = (0, 0, 1, \/2, 1)^/(2 + \/2) with eigenvalue \/2. The population 
dynamics equations are yi = ^2,2/2 = 2/1,2/3 = 2/4,2/4 = 2/3 + 2/5,2/5 = ^4- The first two equations 
are completely decoupled from the last three and the solutions for yi and 2/2 are the same as for 
Example 3. For the other irreducible graph the solution is (because y\ = 7/3 + 2/5 = 2^/4) 

y^it) = Ae^' + Be-^\ ysit) = -^{Ae^' + iJe"^*) + C, 

V2 

y,{t) = ^{Ae^' + Be-^')-C. 

Thus, the populations of nodes 3, 4 and 5 also grow exponentially but at a faster rate, reflecting 
the higher Perron-Frobenius eigenvalue of the subgraph comprising those nodes. Therefore, this 
structure eventually overshadows the 2-cycle, and the attractor is X = e. The dominant ACS in 
this case is the irreducible subgraph formed by nodes 3, 4 and 5. More generally, when a graph 
consists of several disconnected ACSs with different individual Ai, only the ACSs whose Ai are the 
largest (and equal to Ai(C)) end up with non-zero relative populations in the attractor. 



Example 7. A 2-cycle downstream from another 2-cycle, Figure 14^2^ : 

What happens when the graph contains two ACSs whose individual Ai equals Ai(C), and one 
of those ACSs is downstream of another? In Figure 1412^ nodes 3 and 4 form a 2-cycle that is 
downstream from another 2-cycle comprising nodes 1 and 2. The unique PFE of this graph, with 
Ai = 1, is e = (0,0,1,1)^/2. The population dynamics equations are yi = 2/2,2/2 = 2/1,2/3 = 
2/4 + 2/2,2/4 = 2/3- Their general solution is: 

yi{t) = Ae* + Be-\ y2{t) = Ae* - Be"*, 
2/3(0 = \{Ae' - Be-') + Ce* + De-\ 

y,{t) = \{Ae' + Be-') + {C - \)e' + (| - D)e-K 

It is clear that for large t, y\ ~ e*,y2 ~ e*,y3 ~ ie*,y4 ~ te*. While all four grow exponentially 
with the same rate Ai, as f — > 00 7/3 and 2/4 will overshadow 2/1 and 2/2- The attractor will be 
therefore be X = (0, 0, 1, 1)^/2 = e. Here the dominant ACS is the 2-cycle of nodes 3 and 4. This 
result generalizes to other kinds of ACSs: if one strong component is downstream of another with 
the same Perron-Frobenius eigenvalue, the latter will have zero relative population in the attractor. 
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Figure 4.3: Examples of graphs with multiple PFEs. a. ei,e2,e3 are all eigenvectors with eigen- 
value Ai = 0. Only 63 is the attractor. Thus for generic initial conditions, only node 7, which sits 
at the end point of the longest chain of nodes is non-zero in the attractor. b. 61,62,63 are all 
eigenvectors with eigenvalue Ai = 1, but only 63 is the attractor. Only the 2-cycle, of nodes 11 
and 12, which sits at the end of the longest chain of cycles, is non-zero in the attractor. 



The above examples displayed graphs with a unique PFE, and illustrated proposition 4.1(iv). 
The stability of the attractor follows from the fact that the constants A, B, C, D, etc., in the above 
examples, that can be traded for the initial conditions of the populations, do not appear anywhere 
in the attractor X. Now I consider examples where the PFE is not unique. 

Example 8. Graph with Ai = and three weakly connected components. Figure lOb : 
This graph has three independent PFEs, displayed in Figure l43b . The attractor is X = 63. This 
is an immediate generalization of Example 1 above. Using the same argument as for Example 1, 
we can see that m ~ t'' if the longest path ending at node i is of length k. Thus, the populations 
of nodes 1, 2, 3 and 5 are constant, those of 4 and 6 increase ~ t for large t, and of 7 as ~ t^. 
Therefore, only nodes at the ends of the longest paths will be non-zero in the attractor. 

Example 9. Several weakly connected components containing 2-cycles, Figure l43b : 
Here again there are three PFEs, one for each weakly connected component. The population of 
nodes in 2-cycles that are not downstream of other 2-cycles (nodes 1,2,3,4,7 and 8) will grow 
as e*. As in Example 7, Figure 14^ . the nodes of 2-cycles that are downstream of one 2-cycle 
(nodes 5, 6, 9 and 10) will grow as ie*. It can be verified that the populations of nodes in 2-cycles 
downstream from two other 2-cycles (nodes 11 and 12) will grow as t^e*. The pattern is clear: 
in the attractor only the 2-cycles at the ends of the longest chains of 2-cycles will have non-zero 
relative populations. Therefore, the attractor is X = 63. 



4.6. Examples of the attractor for specific graplns 
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4.7 Timescale for reaching the attractor 



How long does it take the system to reach the attractor? This depends on the structure of the 
graph C. For instance in Exannple 2, the attractor is approached as the population of node 1, yi, 
overwhelms the population y2- Because yi grows exponentially as e*, the attractor is reached on a 
timescale X^^ = 1. (In general, when I say that 'the timescale for the system to reach the attractor 
is r', I mean that for t » r, x(t) is 'exponentially close' to its final destination X = linif^oo x(t), 
i.e. for all i, \xi{t) — Xi\ ~ e~*/^i", with some finite a.) In contrast, in Example 1, the attractor 
is approached as ys overwhelms yi and y2- Because in this case all the populations are growing as 
powers of t, the timescale for reaching the attractor is infinite. When the populations of different 
nodes are growing at different rates, this timescale depends on the difference in growth rate between 
the fastest growing population and the next fastest growing population. 

For graphs with Ai = like those in Example 1 and 8, all populations grow as powers of t, 
hence the timescale for reaching the attractor is infinite. 

For graphs that have Ai > 1 but all the basic subgraphs are in different weakly connected com- 
ponents, such as Examples 2-6, the timescale for reaching the attractor is given by (Ai — ReA2)~^, 
where A2 is the eigenvalue of C with the next largest real part, compared to Ai. 

For graphs having Ai > 1 with at least one basic subgraph downstream from another basic 
subgraph, the ratio of the fastest growing population to the next fastest growing one will always 
be a power of t (as in Examples 7 and 9), therefore, the timescale for reaching the attractor is 
again infinite. 



As the dominant ACS is specified by a particular PFE, I will define the core of the dominant ACS to 
be the core of the corresponding PFE. If the PFE is simple, the core of the dominant ACS consists 
of just one basic subgraph. If the PFE is non-simple the core of the dominant ACS will be a union 
of more than one basic subgraph. The dominant ACS is uniquely determined by the graph. This 
motivates the definition of the core and periphery of a graph: 

Definition 4.2: Core and periphery of a graph. 

The core of a graph C, denoted Q{C), is the core of the dominant ACS of C, if an ACS 
exists in the graph. The periphery of C is the periphery of the dominant ACS of C If no 



4.8 Core and periphery of a graph 



ACS exists in the graph, i.e., Ai(C) = then Q{C) = ^ 




In all cases Xi{Q{C)) = Ai(C). Henceforth all graphs will be depicted with the following colour 
scheme: black for core nodes, grey for periphery nodes (and for nodes with non-zero Xi when there 
is no ACS), and white for nodes that are not in any of the PFE subgraphs. 



4.9. Keystone nodes 
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Figure 4.4: Keystone nodes, a. All nodes of the graph are keystone, b. Nodes 3, 4 and 5 are 
keystone, c. Nodes 4 and 5 are keystone, d. None of the nodes are keystone. 



4.9 Keystone nodes 



In ecology certain species are referred to as keystone species - those vyhose extinction or r ennova 



would seriously disturb the ecosystenn (|Painel. 11963: 1.Jordan et a 1. 1. llQQQl: 



Sole and Montoval.l200lh . 



One might similarly look for the notion of a keystone node in a directed graph that captures some 
important organizational role played by a node. Consider the impact of the hypothetical removal 
of any node i from a graph C. One can, for example, study the structure of the core of the graph 
C — i that would result if node i (along with all its links) were removed from C. 



Definition 4.3: Core overlap. 

Given any two graphs C and C whose nodes are labeled, the core overlap between them, 
denoted Ov{C, C"), is the number of common links in the cores of C and C, i.e., the number 
of ordered pairs for which Qij and Q'^j are both non-zero . If eith er C or C does not 



have a core, Ov{C,C') is defined to be zero 



lin 



and Krishn; 



2nn2ah 



Definition 4.4: Keystone node. 

I will refer to a no de i of a graph C as a keystone node if C has a non-vanishing core and 
Ov{C, C -i)=0 (lain and Krishnal. EoES, Eq3)- 



Thus a keystone node is one whose removal modifies the organizational structure of the graph (as 
represented by its core) drastically. In each of Figures lOb -d. for example, the core is the entire 
graph. In Figure l4r4b . all the nodes are keystone, because the removal of any one of them would 
leave the graph without an ACS (and hence without a core). In general when the core of a graph 
is a single re-cycle, for any n, all the core nodes are keystone. In Figure l4!4b . nodes 3, 4 and 5 
are keystone but the other nodes are not, and in Figure ITU : only nodes 4 and 5 are keystone. In 
Figure 1441 1. there are no keystone nodes. These examples show that the more internal pathways 
a core has (generally, this implies a higher value of Ai), the less likely it is to have keystone nodes, 
and hence the more robust its structure is to removal of nodes. 
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Figure 4.5: Another example of a keystone node, 
because its removal produces the graph (b) which I 
core nodes of both graphs are coloured black. 



b) 




Node 3 is a keystone node of the graph (a) 
IS a zero core overlap with the graph (a). The 



Figure l4~5l illustrates another type of graph structure that has a keystone node. The graph in 
Figure l4~5b consists of two strong components - a 2-cycle (nodes 4 and 5) downstream from an 
irreducible subgraph consisting of nodes 1, 2 and 3. The core of this graph is the latter irreducible 
subgraph. Figure l4~5b shows the graph that results if node 3 is removed with all its links. This 
consists of one 2-cycle downstream from another. Though both 2-cycles are basic subgraphs of the 
graph, as discussed in Example 7, Figure 14^2^ . this graph has a unique (upto constant multiples) 
PFE, whose subgraph consists of the downstream cycle (nodes 4 and 5) only. Thus the 2-cycle 4-5 
is the core of the graph in Figure ITsb . Clearly Ov{C,C — 3) = 0. Therefore, node 3 in Figure 
14.5b is a keystone node. 

The above purely graph theoretic definition of a keystone node turns out to be useful in the 
dynamical system discussed in this and the following chapters. For other dynamical systems, 
alternate definitions of keystone might be more useful. 



Chapter 5 

Graph Dynamics 



In this chapter, I present a model of an evolving network that is a specific instance of the framework 
described in section ITtI The dynamical system described in the previous chapter is used for step 1 
of the dynamics, and certain rules are chosen for steps 2 and 3 that aim to capture some features 
of the evolution of chemical networks on the prebiotic Earth. 

5.1 Graph dynamics rules 

The initial graph is constructed as follows: For every ordered pair with i 7^ j (where i,j G 
S = {1,2,... ,s}), Cij is independently chosen to be unity with a probability p and zero with a 
probability 1 — p. ca is set to zero for all i e S. Thus the initial graph is a random graph, a 
member of the ensemble Gf described in section IZ9l Each is chosen randomly, with uniform 
probability, in [0, 1] and all Xj are rescaled so that Yli=i = 1- 

Step 1. With C fixed, x is evolved according to ()4.ip until it converges to a fixed point, 
denoted X. 

Step 2. The set C of nodes with the least Xi is determined, i.e, £ = G S\Xi = minjg^ Xj}. 

A node, say node k, is picked randomly from C and is removed from the graph along 
with all its links. 

Step 3. A new node (also denoted k) is added to the graph. Links to and from k to other 
nodes are assigned randomly according to the same rule, i.e, for every i ^ k oik and Cki 
are independently reassigned to unity with probability p and zero with probability 1—p, 
irrespective of their earlier values, and Ckk is set to zero. All other matrix elements of 
C remain unchanged. Xk is set to a small constant xq, all other Xj are perturbed by a 
small amount from their existing value Xi, and all Xj are rescaled so that X]i=i ^« — ^■ 

This process, from step 1 onward, is iterated many times. Figure [5?l] depicts the graph dynamics 
schematically. 
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Selection 
(step 2) 




Figure 5.1: Schematic depiction of the graph dynamics. Starting with a random graph, first the 
population dynamics is allowed to run until the relative populations reach their attractor (step 1). 
Then one of the nodes with the least relative population is removed from the graph (selection, step 
2). It is replaced by a new node which is assigned random links to existing nodes (novelty, step 3). 
This process is then iterated. 



5.2. Features of the graph dynamics 
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5.2 Features of the graph) dynamics 



5.2.1 Evolution in a prebiotic pool 



These graph dynamics rules are motivated by the puzzle of how a complex chemical organization 
might have emerged from an initial 'random soup' of chemicals. As discussed in section fLQl the 
rules model the imagined dynamics of an evolving chemical network such as might have existed 
in a pool on the prebiotic Earth. Several assumptions and idealizations underlie such a chemical 
interpretation of the model rules. Some of these have been mentioned in sections fl.9l and l4~Tl 
however I will defer a more detailed discussion to section lOl 

The model has two main so urces of inspiration. One is the set of mode l s studied by Farmer 
KaufFman, Packard and others llKaufFmanl. 11983. Il993l: iFarmer et al.[ Il986l: iBaelev and Farmer . 



1SS2 



Baglev et al.|.ll99ll ). and bv lFontanal (jl^ 



Fontana and Bussl ( 19941 ) (see also fstadler et 



Like these models, the present one employs an artificial chemistry of catalyzed 



reactions, albeit a much simpler one, in wh ich populations of specie s evolve over time. To this 

that the 'least fit species 



I add the feature, inspired by the model of iBak and Sneppen 



mutates' with 'fitness' here being equated to the relative population. Unlike the Bak-Sneppen 
model, however, the 'mutation' of a species also changes its links to other species. 



5.2.2 Coupling of population and graph dynamics: two timescales 

As discussed in section ITTl the coupling of the population dynamics and the graph dynamics is 
built into the framework of the model: the evolution of the Xj depends on the graph C in step 1, 
and the evolution of C in turn depends on the Xi through the choice of which node to remove in 
step 2. There are two timescales in the dynamics, a short timescale over which the graph is fixed 
while the Xi evolve, and a longer timescale over which the graph is changed. 



5.2.3 Absence of self-replicators 



While in previous sections I have considered graphs with 1-cycles, the requirement cu = in the 
present section forbids 1-cycles in the graph. The motivation is the following: 1-cycles represent 
self-replicating species (see previous section. Example 2). Such species, e.g., RNA molecules, are 
difficult to produce and maintain in a prebiotic scenario and it is possible that it requires a self- 



supporting rr i olecu 
Jovce et a I 



a r orga n izatio n to be in place before an RNA world, for example, can take off 
19871: IJovcd. Il989l ). Thus, I wish to address the question: can one get molecular 
organizations that can collectively self-replicate without introducing self-replicating species 'by 
hand' into the system? 
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Figure 5.2: The number of links versus time, n, for various runs. Each run had s = 100. The black 
curve is a run with no selection: a random node is picked for removal at each graph update. The 
other curves show runs with selection and with different p values: blue, p = 0.001; red, p = 0.0025; 
green, p = 0.005. 



5.2.4 Selection and novelty 

The rules for changing the graph implement selection and novelty, two important features of 
natural evolution. Selection is implemented by removing the node that is 'performing the worst', 
with 'performance' in this case being equated to a node's relative population (step 2). Adding a 
new node introduces novelty into the system (step 3). Note that although the actual connections of 
a new node with other nodes are created randomly, the new node has the same average connectivity 
as the initial set of nodes. Thus, the new node is not biased in any way toward increasing the 
complexity of the chemical organization. Steps 2 and 3 implement the interaction of the system 
with the external environment. The phenomena to be described in the following sections are all 
consequences of the interplay between selection, novelty and the population dynamics. 



5.3. Implementation 
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5.3 Implementation 

Two programs are described in appendix B, and are included in the attached CD, that use different 
methods to implement the graph dynamics. The programs differ in the way the attractor is 
determined at each time step. The first program uses theorem 4.1. The second numerically 
integrates equation 1)4. l|l to find its attractor. The programs are written in C++, with Matlab 5.2 
being used to determine eigenvalues and eigenvectors where required. 

5.4 Results of graph) evolution 

Figure shows the total number of links in the graph versus time (n, the number of graph 
updates). Three runs of the model described in the previous section, each with s = 100 and 
different values of p are exhibited. Also exhibited is a run where there was no selection (in which 
step 2 is modified: instead of picking one of the nodes of £, any one of the s nodes is picked 
randomly and removed from the graph along with all its links. The rest of the procedure remains 
the same). Figure lOl shows the time evolution of two more quantities for the same three runs 
with selection displayed in Figure ls!2l The quantities plotted are si, the number of nodes with 
Xi > 0, and Ai, the Perron-Frobenius eigenvalue of the graph. Figure lOI shows the evolution of 
another graph theoretic measure, the interdependency of the graph, d, for the run of Figure l53b 
(the data files for these runs are included in the attached CD, see appendix C). 

For all the displayed runs the size of the network has been taken to be s = 100. Computational 
constraints have prevented a detailed exploration of larger values of s, though the few runs I have 
studied with values of s upto 250 and ps < 1 have shown a similar behaviour. The values of the 
parameters p and s for the displayed runs were chosen to lie in the regime ps < 1, and much of 
the analytical work described in later chapters, such as estimation of various timescales, assumes 
that ps <C 1. This range is the one of interest from the point of view of the origin of life problem 
because it is likely that the catalytic networks existing in prebiotic pools would have been quite 
sparse. However, it is worthwhile to extend the analysis and simulations to other parameter ranges, 
such as large s or large p values, and compare with the ps < 1 behaviour. 

Coming back to the runs displayed in Figure l5^ it is clear why the number of links fluctuates 
about its random graph value ps'^ in the run without selection. This is because each graph 
update replaces a randomly chosen node with another that has on average the same connectivity 
and, therefore, the graph remains random like the starting graph. As soon as selection is added the 
behaviour becomes more interesting. Three regimes can be observed. First, the 'random phase' 
where the number of links fluctuates around ps'^, while si and d are small. Second, the 'growth 
phase' where /, si and J show a clear rising tendency. Finally, the 'organized phase' where I and 
d hover (with large fluctuations) about a value much higher than the initial random graph value, 
and si remains close to its maximum value s (with some fluctuations). The time spent in each 
phase depends on p and s. I analyze the structure of the graph in the three phases in chapterEl 
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Figure 5.3: Number of nodes with Xi > 0, si, (black curve) and the Perron-Frobenius eigenvalue 
of the graph, Ai, (red curve) versus time, n, for the same three runs shown in Figure [5?2l Each 
run has s = 100 and, a. p = 0.001, h. p = 0.0025 and c. p = 0.005, respectively. The Ai values 
shown have been multiplied by 100 to ease comparison with the si curves. 



5.4. Results of graph evolution 
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Figure 5.4: Interdependency vs. n for the run with s = 100 and p = 0.0025 that is displayed in 
Figure lS^b . 
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Figure 5.5: si vs. n for the same run displayed in Figure lOb (s = 100, p = 0.0025) for a longer 
time, till n = 50000. Repeated rounds of catastrophes and recoveries are seen. 
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Figure l5^ shows si for the same run as that of Figure l53b for a longer time, from n = 1 
to n = 50,000. In this long run one can see several sudden, large drops in si: 'catastrophes' in 
which a large fraction of the s nodes become extinct, i.e., the Xi values of the nodes fell to zero. 
Some of the drops seem to take the system back into the random phase, others are followed by 
'recoveries' in which si rises back towards its maximum value s. The recoveries are comparatively 
slower than the catastrophes, which in fact occur in a single time step. An analysis of crashes and 
recoveries is the subject of chapter^! 

It may be useful for the reader to trace the steps through one example run. Figure [5?6l shows 
snapshots of the graph at various times for the run shown in Figure l5^ . which has p = 0.0025. 
The initial random graph (n = 1, Figure lOb ) has 31 links. 55 nodes are isolated with no incoming 
or outgoing links, and there are no cycles. In the attractor only node 13 is non-zero, all others are 
zero. Figures lOb -d show further snapshots of the graph in the random phase. 

The first cycle comprising nodes 26 and 90 forms at n = 2854 (Figure lS^ ). and these nodes 
are the only non-zero ones in the attractor. This marks the beginning of the growth phase. At 
n = 3022 (Figure l5^ ) more nodes get attached to the cycle and their relative populations also 
become non-zero. By n = 3386 (Figure [5?6| -) 28 nodes have added onto the 2-cycle. All these 30 
nodes are non-zero in the attractor, while the other nodes are zero. At n = 3387 (Figure l5^ ) 
there is a sudden drop in si as a new 2-cycle comprising nodes 41 and 98 is formed. This 2-cycle 
is downstream from the earlier 2-cycle, and for such a graph, as explained in section (Example 
7), only the downstream cycle will be non-zero. By n = 3402 (Figure l5?^ ) the situation has not 
altered much, the old 2-cycle still survives by chance. But then node 38 gets removed, which 
breaks the path connecting the cycles. So at n = 3403 (Figure \5^ ) the graph consists of two 
disconnected 2-cycles. Both these cycles and all nodes having access from them are non-zero in 
the attractor. By n = 3488 (Figure IS!Hk ) several nodes have joined the 26-90 cycle. By chance 
none have joined the 41-98 cycle. At n = 3489 (Figure [5?6l ) the 26-90 cycle gets strengthened 
by a second cycle. This subgraph now has a larger Perron-Frobenius eigenvalue and so the other 
cycle is zero in the attractor. After this the dominant ACS keeps accreting nodes until it spans 
the entire graph for the first time at n = 3880 (Figure l5^6l n) signaling the start of the organized 
phase. At n = 4448 (Figure lS^ ) the core is at its largest. Most subsequent graph updates involve 
the removal of a node with no more drastic effect on the dominant ACS. This removed node will 
keep getting repeatedly replaced until it rejoins the dominant ACS. At n = 4695 (Figure [5?6b ) the 
node 36 which is outside the ACS and therefore zero in the attractor, gets replaced and the new 
node rejoins the ACS resulting in the graph at n = 4696 (Figure lO^ )). Notice that node 36 forms 
a 2-cycle with node 74. Thus, this graph update has created a new irreducible subgraph in the 
periphery. At the moment this structure does not affect the dominant ACS because the core has 
a larger Ai. However, a few hundred time steps later this 2-cycle plays a significant role. 



5.4. Results of graph evolution 
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Figure l5^ -f. continued on next page. 
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Figure l5^ -I. continued on next page. 



5.4. Results of graph evolution 
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Figure l5^Hl n-r. continued on next page. 
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Figure [5?^ -x. continued on next page. 



5.4. Results of graph evolution 
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Figure 5.6: (see previous pages also) Snapshots of the graph at various times for the run shown in 
Figure l53b with s = 100 and p = 0.0025. See the text for a description of the major events. In 
all graphs, white nodes are those with Xi = 0. All shaded nodes have Xi > 0. If there is an ACS 
in the graph, the core nodes are coloured black and the periphery nodes grey. The graphs have 
been drawn using LEDA. Data files for these graphs, in LEDA format, and files containing their 
adjacency matrices can be found in the attached CD (see appendix C). 



By re = 5041 (Figure lOl^ ) the core has shrunk to 5 nodes. Node 85 gets removed which 
leaves the graph at n = 5042 (Figure lS^ ) with one 2-cycle (nodes 36 and 74) downstream from 
another (26 and 90). Only nodes 36, 74 and node 11 (downstream from 74) are non-zero in the 
new attractor. All other nodes have become extinct, triggering a drop in si from 100 to 3. Over 
the next thousand time steps the dominant ACS rebuilds itself around the new core comprising 
nodes 36 and 74 (re = 6061, Figure lOfe ). Then node 60 comes in, completing a cycle of 5 nodes 
that is downstream from the 36-74 cycle [n = 6062, Figure 15?^ ) and again there is a large drop 
in si as 36, 74 and many nodes having access from them become extinct. At re = 6070 (Figure 
15. 6L ). however, the 36-74 cycle is resurrected as the path connecting it to the 5-cycle is broken. 
It does not last for too long as the 5-cycle is strengthened by another cycle at re = 6212 (Figure 
15. 6t /) and the 36-74 cycle again becomes extinct. This time it does not revive, as the dominant 
ACS builds around the strengthened 5-cycle. About two thousand time steps later, at n = 8232 
(Figure l5^6l v). the graph is again fully autocatalytic but is on the verge of a major collapse. The 
core is just a 3-cycle and node 54 gets removed, thus destroying the only cycle in the graph. The 
graph at re = 8233 (Figure ls!6k ) is left without an ACS; the system is now once again in the 
random phase. Within a few hundred time steps all of the structure is destroyed and the graph 
at re = 8500 (Figure l5^ ) and re = 10000 (Figure l5^ ) are similar to the initial graph at re = 1 
(Figure l5?6b ). At this point, I will stop following the run but eventually a new ACS arises, grows 
and spans the entire graph, then gets destroyed and another round starts (see Figure l5^ . 
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Chapter 6 

Formation and Growth of 
Autocatalytic Sets 

6. 1 The random phase 

For the run with s = 100, p = 0.0025 displayed in Figure lOb . the initial random graph at n = 1 
contains no cycles, and hence no ACSs, and its Perron-Frobenius eigenvalue is Ai = 0. Figure l6?l] 
shows this initial graph. 

For such a graph Xi > for all nodes that are at the ends of the longest paths of nodes, and 
Xi = for every other node (see section IT^]) . In Figure [611 there are two paths of length 4, which 
are the longest paths in the graph. Both end at node 13, which is therefore the only non-zero node 
in the attractor for this graph. This node, then, is the only node protected from removal during 
the graph update. Node 13 has a high relative population but its supporting nodes do not have 
high relative populations. Inevitably within a few graph updates a supporting node will be removed 
from the graph. When that happens node 13 which presently has non-zero Xi will no longer be at 
the end of the longest path and hence will get Xi = 0. In the run node 34 is replaced at the 8th 
time step. After that node 13 joins the set C of nodes with least Xi. Thus no structure is stable 
when there is no ACS. Eventually, all nodes are removed and replaced. 

Figures [6?2l and lOl compare properties of the graphs from the first random phase of the run 
(from n = 1 to n = 2853, inclusive), with properties of the random graph ensemble, Gf, described 
in section [Z9l The average number of links for the 2853 graphs in the random phase is 25.2, with 
standard deviation 5.3. For the ensemble G§, with s = 100, p = 0.0025, the average number of 
links \sps{s — l) = 24.75. Figure lOl shows that the in and out degree distributions of graphs taken 
from the random phase are close to the binomial distribution expected for G? for smaller values 
of degree but depart from the binomial for larger degrees. Figure lOl compares the dependency 
distribution of the random phase with that of 10^ graphs taken from the ensemble Gf. Again, the 
distributions are similar for smaller dependency values, and differ for higher values. 
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Figure 6.1: The initial random graph, at n = 1, for the run shown Figure l53b . Node 13 is the 
only node with non-zero Xi because it is at the end of the longest path. 



6. 1 . The random phase 
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Figure 6.2: Comparison of the a. in-, and b. out-degree distributions (bars) of 2853 graphs 
fronn the randonn phase (n = 1 to n = 2853 of the run in Figure l53b ) with the binonnial degree 
distribution i3p~^(A;) = *~^Cfcp'^(l— p)'^^^"^ (lines with circles) expected for a random graph with 



100, p = 0.0025. The error bars show the expected standard deviation, 



Br, 



285300 



of the distribution from the binomial for a finite sample of 2853 random graphs each with 100 
nodes. 
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Figure 6.3: Comparison of the dependency distribution of 2853 graphs (bars) fronn the randonn 
phase (n = 1 to n = 2853 of the run in Figure lOb ) with the dependency distribution (shown in 
figureESI) of 10^ random graphs with s = 100,p = 0.0025 (I ine with circles). The error bars show 

the expected standard deviation, \f^^^^^^^, of the distribution for a finite sample of 2853 
random graphs each with 100 nodes (this estimate of the error assumes that the dependencies of 
each node is independently picked from the distribution D{d) and does not take into account any 
correlations that would exist, even for random graphs, between dependencies of nodes of the same 
graph). 



Although the properties of the random phase graphs do not exactly match the properties of 
graphs from the ensemble Gf , the main point about the graph dynamics holds nevertheless: In the 
random phase no graph structure is stable for very long and eventually all nodes are removed and 
replaced. 

Note that the initial random graph is likely to contain no cycles when p is small [ps <^ 1). If 
larger values of p are chosen, it becomes more likely that the initial graph will contain a cycle. If 
it does, there is no random phase; the system is then in the growth phase, discussed below, right 
from the initial time step. 



6.2 The growth phase 

At some graph update an ACS is formed by pure chance. The probability of this happening can 
be closely approximated by the probability of a 2-cycle (the simplest ACS with 1-cycles being 
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disallowed) forming by chance, which is p^s (= the probability that in the row and column cor- 
responding to the replaced node in C, any matrix element and its transpose are both assigned 
unity). Thus, the 'average time of appearance' of an ACS is Tq = and the distribution of 

times of appearance is P{na) = p'^s{l — p^s)"""^. This approximation is better for small p. In 
the run displayed in Figure l53b . a 2-cycle between nodes 26 and 90 formed at n = 2854. This 
is a graph that consists of a 2-cycle and several other chains and trees. For such a graph, I have 
shown in Example 3 in section 14^61 that the attractor has non-zero Xi for nodes 26 and 90 and 
zero for all other nodes. The dominant ACS consists of nodes 26 and 90. Therefore these nodes 
cannot be picked for removal at the graph update and, hence, a graph update cannot destroy the 
nodes and links of the dominant ACS. The autocatalytic property is guaranteed to be preserved 
until the dominan t ACS spans the whole graph. In fact, an even stronger statement can be made 
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Proposition 6.1: Ai is a non-decreasing function of n as long as si < s. 

When a new node is added to the graph at a graph update, one of three things will happen: 

1. The new node will not have any links from the dominant ACS and will not form a new ACS. 
In this case the dominant ACS will remain unchanged, the new node will have zero relative 
population and will be part of £. For small p this is the most likely possibility. 

2. The new node gets an incoming link from the dominant ACS and hence becomes a part of it. 
In this case the dominant ACS grows to include the new node. For small p, this is less likely 
than the first possibility, but such events do happen and in fact are the ones responsible for 
the growth of the dominant ACS. 

3. The new node forms another ACS. This new ACS competes with the existing dominant 
ACS. Whether it now becomes dominant, overshadowing the previous dominant ACS or it 
gets overshadowed, or both ACSs coexist depends on the Perron-Frobenius eigenvalues of 
their respective subgraphs and how they are connected. Again, for small p, this is a rare 
event compared with possibilities 1 and 2. 

Typically the dominant ACS keeps growing by accreting new nodes, usually one at a time, until 
the entire graph is an ACS. At this point the growth phase stops and the organized phase begins. 



6.2. 1 Timescale for growth of the dominant ACS 

Assuming that possibility 3 above is rare enough to neglect, and that the dominant ACS grows 
by adding a single node at a time, one can estimate the time required for it to span the entire 
graph. Let the dominant ACS consist of si{n) nodes at time n. The probability that the new 
node gets an incoming link from the dominant ACS and hence joins it is psi. Thus in An 
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Figure 6.4: Dependence of (the growth timescale of the dominant ACS) on p. Each data point 
shows the average of Inr^ over 5 different runs with s = 100 and the given p value. The error bars 
correspond to one standard deviation of the Inr^ values for each p. The straight line is the best 
linear fit to the data points on a log-log plot. It has slope —1.03 it 0.03 and intercept —0.20 ±0.25 
which is consistent with the expected slope —1 and intercept 0. 



graph updates, the dominant ACS will grow, on average, by Asi = psiAn nodes. Therefore 
si(n) = si(na)exp((n — na)/Tg), where Tg = 1/p, Ua is the time of appearance of the first ACS 
and si{na) is the size of the first ACS (=2 for the run shown in Figure lOb ). Thus si is expected to 
grow exponentially with a characteristic timescale Tg = 1/p. The time taken from the appearance 
of the ACS to its spanning is Tgln{s/ si{na))- This analytical result is confirmed by simulations 
(see Figure lOjl . In the displayed run, after the first ACS (a 2-cycle) is formed at n = 2854, 
it takes 1026 time steps, until n = 3880 for the dominant ACS to span the entire graph. This 
explains how an autocatalytic network structure and the positive feedback processes inherent in it 
can bootstrap themselves into existence from a small seed. The small seed, in turn, is guaranteed 
to appear on a certain timescale [l/p^s in the present model) just by random processes. 

6.3 Fully autocatalytic graphis 

The growth phase continues until the dominant ACS spans the entire graph. In the displayed run 
this happened at n = 3880. Figure l6^ shows the graph at this time - it is a fully autocatalytic 
graph, all nodes are part of the dominant ACS. 



6.3. Fully autocatalytic graphs 



83 




Figure 6.5: The graph, at n = 3880, for the run shown Figure lOb . For the first time in this run 
the graph is fully autocatalytic. All nodes are part of the dominant ACS and are non-zero in the 
attractor. 
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6.3. 1 Probability of a random grapli being fully autocatalytic 

A fully autocatalytic graph such as this is a highly improbable structure for a random graph. Con- 
sider a graph of s nodes and let the probability of a positive link existing between any pair of nodes 
(except self-links) be p* . Such a graph has on average m* = p*{s — 1) incoming or outgoing 
positive links per node. For the entire graph to be an ACS, each node must have at least one 
incoming link, i.e. each row of the matrix C must contain at least one positive element. Hence 
the probability, P, for the entire graph to be an ACS is 

P = probability that every row has at least one positive entry 
= [probability that a row has at least one positive entry]'^ 
= [1 — (probability that every entry of a row is zero)]'^ 

= [1 - {1 - p*y-'^Y 

= [l-{l-m*/{s-l)y-^Y. 

Note from Figure ls!2l that at spanning the number of links is 124, i.e., 0{s). Thus the average 
degree m* at spanning is 0(1). I have found this to be true in all the runs I have done where ps 
was 0(1) or less. 

For large s and m* ~ 0(1), P ~ (1 — e~™*)'^ ~ e""**, where a is positive, and 0(1). Thus, a 
fully autocatalytic graph is exponentially unlikely to form if it were being assembled randomly (or 
picked at random from the random graph ensemble ). In the present model nodes are being 
added completely randomly but the underlying population dynamics and the selection imposed at 
each graph update result in the inevitable arrival of an ACS (in, on average, Tq = 1/p'^s time steps) 
and its inevitable growth into a fully autocatalytic graph in (on average) an additional ~ r^lns 
time steps. 

For the displayed run if we take m* = ps ^ 0.25 we get P f« 3 x 10^^^. Even if we take m* 
to be the average degree of the graph at n = 3880, i.e., m* = 1.24, we get P « 3 x 10^^^; one 
would expect such a fully autocatalytic graph to take ~ 10^^ time steps to form if it were being 
assembled randomly, in contrast to the 3880 time steps it actually took in this run. 

As mentioned in section IT^ one of the puzzles of the origin of life is: how could a highly 
structured chemical network form in such a short time after the oceans condensed on the Earth? 
In this model, a highly non-random structure, a fully autocatalytic set, inevitably forms. Further, 
the time taken for it to form is extremely short, growing only as a logarithm of the size of the 
network. This model, therefore, suggests a mechanism - the growth of ACSs by the accretion 
of nodes around a small initial ACS - by which highly structured chemical networks (that are 
exponentially unlikely to form by chance) can form in a very short time. This mechanism, or its 
analogue in a more realistic chemical model, might explain the emergence of a structured chemical 
organization on the prebiotic Earth. This is discussed further in section l9^ 
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6.3.2 Clustering coefficient 

In several runs with s = 100, p = 0.0025 (totaling 1.55 nnillion iterations) 160659 graphs were fully 
autocatalytic. The mean clustering coefficient of these 160659 graphs was 0.025, with standard 
deviation 0.016, which is connparable to the clustering coefficient for randonn graphs with the same 
connectivity {p* = 1.27/(s — 1), see below). The graph in Figure lOl in fact, has a clustering 
coefficient of zero: none of the neighbours of any node are connected to each other, as can be 
verified from the figure. Thus, the fully autocatalytic graphs produced in these runs do not have 
a small-world structure. 

6.3.3 Degree and dependency distributions 

The in-degree and out-degree distributions of the nodes of all the 160659 fully autocatalytic graphs 
produced in these runs are shown in Figure The mean in-degree for nodes of these graphs was 
1.27, and the standard deviation was 0.50. The mean out-degree was 1.27, with standard deviation 
1.75. The distributions are compared with the binomial distribution Bp^^{k) (as expected for 
random graphs from the ensemble ; p* = 1.27/ {s — 1)) and the shifted binomial Bp~^{k — 1) 
(as would be expected for a random autocatalytic graph). The in-distribution is quite similar in 
functional form to the shifted binomial for small in-degrees but departs from it for higher values. In 
contrast, the out-distribution falls off much more slowly, though it is not scale-free either because it 
is not a straight line in a log-log plot (Figure [6T|l . However, as s = 100, the allowed values for the 
degree of a node cover a range of only two decades. If a portion of the out-degree distribution is a 
power-law then it is more likely to show up for graphs with higher s, where the degree distributions 
would span more decades. For such networks, with larger s, whether the fully autocatalytic graphs 
produced are scale-free or not is an open question. 

The dependency distribution of the fully autocatalytic graphs produced in the runs (Figure lQ]! 
has a substantially larger mean than the distribution for graphs of the random phase (compare 
with Figure This is consistent with the growth of interdependency observed in Figure l54l 

Further, as evident from Figure lOl the dependency distribution is also clearly different from, and 
much narrower than, the distribution for random graphs with a comparable connectivity p* = 
1.27/(s — 1). The mean dependency of nodes of the fully autocatalytic graphs (30.93, with 
standard deviation 14.71) is slightly larger than the mean dependency for 1.5 x 10^ random graphs, 

, with p* = 127/(s - l),s = 100 (23.70, with standard deviation 30.82), but the maximum 
dependency observed for the fully autocatalytic graphs (100) is smaller than the maximum for the 
random graphs (163). In all the fully autocatalytic graphs, the minimum dependency was two, and 
the dependency distribution is small for small values of dependency. In contrast, the dependency 
distribution for the random graphs is maximum at zero dependency. 
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Figure 6.6: Light red bars: in-degree distribution of nodes of 160659 fully autocatalytic graphs 
from several runs with s = 100, p = 0.0025 (totaling 1.55 million iterations). Mean in-degree is 
1.27. Light blue bars: out-degree distribution for the same graphs. Mean out-degree is 1.27. Black 
curve: the binomial distribution Btj^^{k) expected for a random graph G§ ,p* = 1.27/(s — 1); 
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error bars are the expected standard deviation, 
graphs. Red curve: the shifted binomial distribution Bp''^{k - 

autocatalytic set; error bars are the expected standard deviation, 



for a finite sample of 160659 
1) expected for a random fully 
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Figure 6.7: The out-degree distribution of fully autocatalytic sets in several runs with s = 100, p = 
0.0025 (light blue bars in Figure lO]) on a log-log plot. A scale-free distribution would be a straight 
line. 




Figure 6.8: Bars: The dependency distribution of nodes of 160659 fully autocatalytic graphs from 
several runs with s = 100, p = 0.0025 (totaling 1.55 million iterations). Mean dependency is 30.93 
with standard deviation 14.71. Circles: The dependency distribution of 1.5 x 10^ graphs from the 
ensemble , with s = 100, p* = 1.27/(s — 1). The mean dependency is 23.70, with standard 
deviation 30.82. 
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Chapter 7 

Destruction of Autocotolytic Sets 



7. 1 Catastrophes and recoveries in ttie organized and growth) 
phiases 



Once an ACS spans the entire graph the effective dynamics again changes although the microscopic 
dynamical rules are unchanged. At spanning, for the first time since the formation of the initial 
ACS, a node of the dominant ACS will be picked for removal. This is because at spanning all 
nodes belong to the dominant ACS and have non-zero relative populations; one node nevertheless 
has to be picked for removal. Most of the time the removal of the node with the least Xi will 
result in minimal damage to the ACS. The rest of the ACS will remain with non-zero relative 
populations, and the new node will repeatedly be removed and replaced until it once again joins 
the ACS. Thus si will fluctuate between s and s — 1 most of the time. However, once in a while, 
the node that is removed happens to be playing a crucial role in the graph structure despite its low 
relative population. Then its removal can trigger large changes in the structure and catastrophic 
drops in si and I. Alternatively, it can sometimes happen that the new node added can trigger a 
catastrophe because of the new graph structure it creates. 

In Figure [531 which shows si for the same run as that in Figure but for a much longer 
time, one can observe several of the large and sudden drops in si that will be discussed in this 
chapter. These 'catastrophes' in the organized and growth phases are followed by 'recoveries', in 
which si rises on a certain timescale. Figure 1711 shows the probability distribution P(Asi) of 
changes in the number of nodes with Xi > 0; Asi(n) = si{n) — si{n — 1). The asymmetry 
between rises and drops as well as fat tails in the distribution are evident. For low p the probability 
of large drops is an order of magnitude greater than intermediate size drops (also see Figure l53|l . 
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Figure 7.1: Probability distribution of changes in si, the number of nodes with Xi > 0. P(Asi) is 
the fraction of tinne steps in which si changes by an annount Asi in one tinne step in an ensemble 
of runs with s = 100 and p = 0.001,0.0025,0.005. Only time steps where an ACS initially exists 
are counted. 
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Figure 7.2: Crashes are predominantly core-shifts. A histogram of core overlaps for the 701 crashes, 
in which si dropped by more than s/2, observed in several runs with s = 100 and p = 0.0025, 
totaling 1.55 million iterations. The attached CD contains data files for all these runs, as well as 
information about the 701 crashes (see appendix C). 
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Figure 7.3: A graph with Ai = 1.22. The Xi values (rounded off to two decimal places) are shown 
adjacent to each node. Notice that if a node has only one inconning link then its Xi value is 1/Ai 
times the value of the node it is getting the link from. E.g. Xq = X^/Xi. Thus, nodes further 
down a chain of single links have lower Xi. Nodes 6 and 7 have the lowest Xi. 



7.2 Crashes and core-shifts 

Definition 7.1: Crash. 

A crash is a graph update event n for which Asi(n) < — s/2 
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A crash is an event in which a significant fraction (arbitrarily chosen as 50%) of the nodes become 
extinct. In several runs with s = 100, p = 0.0025 totaling 1.55 million iterations there were 701 
crashes. The first task is to see if these large drops in si are correlated to specific changes in the 
structure of the graph. 

Definition 7.2: Core-shift. 

A core-shift is a g r aph up date event n for which Ov{Cn-i,Cn) = 



l|jain and Krishna 



Q)- 



Figure 17^ shows a histogram of core overlaps Ov{Cn-i,Cn) for these 701 crashes. 612 of these 
have zero core overlap, i.e., they are core-shifts. (If one looks at only those events in which more 
than 90% of the nodes went extinct, i.e., Asi(n) < —0.9s, then one finds 235 such events in the 
same runs, out of which 226 are core-shifts.) Of the remaining 89 crashes 10 were events in which 
the core remained unchanged and 79 were events in which the core was affected but some overlap 
remained between the new and old cores. Most of the crashes happen when there is a core-shift - 
a drastic change in the structure of the dominant ACS. 

In the 612 core-shifts, the average number of incoming plus outgoing links is 2.27 for all nodes 
in the graph, 2.25 for the node that is removed and 1.25 for the new node. Thus the nodes 
whose removal or addition causes the crash are not excessively rich in links. 'Nondescript' nodes 
such as these cause system wide crashes because of their critical location in a small core (the 
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average core size at the 612 core-shifts is 6.3 nodes). Core-shifts in which the ACS is completely 
destroyed typically cause the largest damage (of 612 core-shifts these are 136 in number, with 
|Asi| = 98.2 lb 1.2). The remaining 476 core-shifts in which an ACS exists after the core-shift 
have |Asi| = 75.0 ± 14.2. The former constitute an increasing fraction of the crashes at smaller 
p values, causing the upturn in P(Asi) at large negative Asi for small p (Figure m|l . 

7.3 Addition and deletion of nodes from a graph) 

In order to understand what is happening during the crashes and subsequent recoveries I begin by 
examining the possible changes that an addition or a deletion of a node can make to the core of 
the dominant ACS. 

7.3.1 Deletion of a node: keystone nodes 

We have already seen how the deletion of a node can change the core - recall the discussion of 
keystone nodes in section the removal of a keystone node results in a core-shift, i.e., a zero 
overlap between the cores of the dominant ACS before and after the removal. In an actual run 
a keystone node can only be removed if it happens to be one of the nodes with the least Xi. 
However, the core nodes are often 'protected' by having higher Xi. The reason for this is: 

X is an eigenvector of C with eigenvalue Ai. Therefore, when Ai 7^ it follows that for 
nodes of the dominant ACS, Xi = (1/Ai) (^ij^j- If node i of the dominant ACS has only one 
incoming link (from the node j, say) then Xi = Xj/\i\ Xi is 'attenuated' with respect to Xj by 
a factor Ai. The periphery of an ACS is a tree-like structure emanating from the core, and for 
small p most periphery nodes have a single incoming link. For instance, the graph in Figure FOl 
whose Ai = 1.22, has a chain of nodes 4 — > 5 — > 6. The farther down such a chain a periphery 
node is, the lower is its Xi because of the cumulative attenuation. For such an ACS the 'leaves' 
of the periphery tree (such as node 6) will have the least Xi while core nodes will have larger Xi. 

However, when Ai = 1 there is no attenuation. At Ai = 1 the core must be a cycle or a set 
of disjoint cycles (proposition 2.2), hence each core node has only one incoming link within the 
dominant ACS. All core nodes have the same value of X^. As one moves out toward the periphery, 
Ai = 1 implies there is no attenuation, hence each node in the periphery that receives a single link 
from one of the core nodes will also have the same Xi. Some periphery nodes may have higher 
Xi if they have more than one incoming link from the core. Iterating this argument as one moves 
further outward from the core, it is clear that at Ai = 1 the core is not protected and in fact will 
always belong to C (the set of nodes with the least Xi) if the dominant ACS spans the graph. We 
have already seen in section 14^91 that when Ai = 1 and the core is a single cycle every core node 
is a keystone node. Thus, when Ai = 1 the organization is fragile and susceptible to core-shifts 
caused by the removal of a keystone node. 
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Figure 7.4: (opposite page) A hierarchy of innovations. Each node in this binary tree represents 
a class of node addition events. Each class has a name; the small box contains the mathematical 
definition of the class. All classes of events except the leaves of the tree are subdivided into two 
exhaustive and mutually exclusive subclasses (represented by the two branches emanating downward 
from the class). The number of events in each class pertain to the run of Figure lOb with a total 
of 9999 graph updates, between n = 1 (the initial graph) and n = 10000. In that run, out of 
9999 node addition events, most (8929 events) are not innovations. The rest (1070 events), which 
are innovations, are classified according to their graph structure. Xk is the relative population of 
the new node in the attractor of equation ()4.1jl that is reached in step 1 of the graph dynamics 
immediately following the addition of that node. If the new node causes a new irreducible subgraph 
to be created, N is the maximal irreducible subgraph that includes the new node. If not, N = ^ 
(the empty set). Qi is the core of the graph just before the addition of the node (just before step 
3 of the graph dynamics) and Qj the core just after the addition of the node. The six leaves of the 
innovation subtree are numbered from 1 to 6 and correspond to the classes discussed in the text. 
Some classes of events happen rarely (e.g., classes numbered 5 and 6) but have a major impact on 
the dynamics of the system. The precise impact of all these classes of innovations on the system 
over a short timescale (before the next graph update) as well as their probable impact over the 
medium term (upto a few thousand graph updates) can be predicted from the structure of N and 
the rest of the graph at the moment these innovations appear in a run. 



7.3.2 Addition of a node: innovations 

I now turn to the effects of the addition of a node. The removal and subsequent addition of a new 
node in a graph update creates a new graph that is likely to have a different attractor from the 
previous graph. In the new attractor the new node, denoted k, may become extinct, i.e., Xk may 
be zero, or it may survive, i.e., X^ is non-zero. If the new node becomes extinct then it remains 
in the set C and there is no change to the dominant ACS. So I will focus on events in which the 
new node survives in the new attractor. 



Definition 7.3: Innovation. 

An innovation is a graph update event in which the relative population of the new node in 
the new attractor is non-zero, i.e an even t in which the new node survives till the next graph 



update 



mm. mm 



I will also use the term 'innovation' to refer to the new structures created - the new node 
and the new links brought in by the graph update event. Figure ITU shows a graph-theoretic 
classification of innovations in terms of a hierarchy. The innovations that have the least impact 
on the relative populations of the nodes and the evolution of the graph on a short timescale (of a 
few graph updates) are ones that do not affect the core of the dominant ACS, if it exists. Such 
innovations are of three types (see boxes 1-3 in Figure rr4|) : 
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1. Random phase innovations. These are innovations that occur in the random phase when 
no ACS exists in the graph, and they do not create any new ACSs. These innovations are typically 
short lived and have little short or long ternn impact on the structure of the graph. In the displayed 
run the first random phase innovation appeared at n = 79 (node 25 in Figure l5^ ). 

2. Incremental innovations. These are innovations that occur in the growth and organized 
phases, which add new nodes to the periphery of the dominant ACS without creating any new 
irreducible subgraph. See Figure 15. 6F . n = 3022, for the first incremental innovation of the 
displayed run. In the short term these innovations only affect the periphery and are responsible for 
the growth of the dominant ACS. In a longer term they can also affect the core as chains of nodes 
from the periphery join the core of the dominant ACS. 

3. Dormant innovations. These are innovations that occur in the growth and organized 
phases, which create new irreducible subgraphs in the periphery of the dominant ACS (e.g. the 
innovation at n = 4696, Figure [^Jd, that created the 2-cycle 36-74 in the periphery). These 
innovations too affect only the periphery in the short term. But they have the potential to cause 
core-shifts later if the right conditions occur (see section 17.4.31) . 

Innovations that do immediately affect the core of the existing dominant ACS are always ones 
which create a new irreducible subgraph. They are also of three types (boxes 4-6 in Figure IT^Il : 

4. Core enhancing innovations. These innovations result in the expansion of the existing 
core by the addition of new links and nodes from the periphery or outside the dominant ACS. They 
result in an increase of Ai of the graph. The innovation at n = 3489 (Figure lS?^ ) is an example. 

5. Core-shifting innovations. These are innovations that cause an immediate core-shift often 
accompanied by the extinction of a large number of nodes. The innovation at n = 6062 (Figure 
15. 6t ) is an example (also see section rr4.2p . 

6. Creation of the first ACS. This is an innovation that creates the first ACS in a graph 
which till then had none. In the displayed run, the first ACS was created at n = 2853 (Figure 
I5.6b l.e). The innovation moves the system from the random phase to the growth phase. 

Definition 7.4: Core transforming innovation. 

An innovations of type 4, 5 or 6 which imm ediately affects the core o f the dominant ACS 



will be called a core transforming innovation l|Jain and Krishna. l2QQ3ah 



The followi ng proposition makes prec ise the conditions under which a core transforming innovation 
can occur (|jain and Krishna. l2QQ3ah . Let C'^ = C„_i — k denote the graph of s — 1 nodes just 
after the node k is removed from C„_i. will stand for the core of C^. 

Proposition 7.1: Let Nn denote the maximal new irreducible subgraph that includes the new 
node at time step n. Nn will become the new core of the graph, replacing the old core 
Qn~i, whenever either of the following conditions are true: 

(a) Ai(A^„) > XiiQ'J or, 

(b) Xi{Nn) = Xi{Q'n) and Nn is downstream of (5'„. 
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Figure 7.5: Classification of core-shifts into three categories. The graph shows the frequency, /, of 
the 612 core-shifts observed (see Figure lT^ in a set of runs with s = 100 and p = 0.0025 vs. the Ai 
values before, Ai(C„_i), and after, Ai(C„), the core-shift. Complete crashes (black: Ai(C„_i) = 
1, Ai(Cn) = 0), takeovers by core-transfornning innovations (blue: Ai(C„) > Ai(C„_i) > 1) and 
takeovers by dormant innovations (red: Ai(C„_i) > Ai(C„) > 1) are distinguished. Numbers 
alongside vertical lines represent the corresponding / value. 



Such a core transforming innovation will fall into category 4 above if Qn-i C Nn- However, if 
Qn-i and Nn are disjoint, we get a core-shift and the innovation is of type 5 if is non-empty 
or type 6 otherwise. 

7.4 Classification of core-sliifts 

Using the insights from the above discussion of the effects of deletion or addition of a node, 
one can classify the different mechanisms that cause core-shifts. Figure FTsl differentiates the 612 
core-shifts observed among the 701 crashes. They fall into three categories: (i) complete crashes 
(136 events), (ii) takeovers by core-transforming innovations (241 events), and (iii) takeovers by 
dormant innovations (235 events). 



98 



Chapter 7. Destruction of Autocatalytic Sets 




Figure 7.6: A complete crash. In the run displayed in Figure at n = 8232 the core is a 3-cycle 
comprising nodes 20, 50 and 54. Node 54 is the node with the least Xi that is removed from the 
graph. This breaks the only cycle in the graph resulting in a graph, at n = 8233, that has no ACS. 
The system goes from the organized phase to the random phase, Ai drops from 1 to 0, and si 
drops form 100 to 2. 



7.4.1 Complete crashes 

A complete crash is an event in which an ACS exists before but not after the graph update. Such 
an event takes the system into the random phase. A complete crash can occur when a keystone 
node is removed from the graph. For example, in the run displayed in Figure l53b and 15.61 at 
n = 8232 (see Figure lT!^ . the graph had Ai = 1 and its core was the simple 3-cycle of nodes 20, 
50 and 54. As we have seen above, when the core is a single cycle every core node is a keystone 
node and is also in the set £ of nodes with the least Xi. Node 54 was removed, thus disrupting 
the 3-cycle. The resulting graph, at n = 8233, had no ACS and Ai dropped to zero. As discussed 
earlier, graphs with Ai = 1 are the ones that are most susceptible to complete crashes. This can 
be seen in Figure FTsl every complete crash occurred from a graph with Ai(C„_i) = 1. 

7.4.2 Takeovers by core transforming innovations 

An example of a takeover by a core-transforming innovation is given in Figure im taken from the 
example run of Figure (Ob . At n = 6061 the core was a single cycle comprising nodes 36 and 
74. Node 60 was replaced at n = 6062 creating a 5-cycle comprising nodes 60,21,41, 19 and 73, 
downstream from the old core. The graph at n = 6062 has one cycle feeding into a second cycle 
that is downstream from it. We have already seen in section 14^61 (see the discussion of Example 
4) that for such a graph only the downstream cycle is non-zero and the upstream cycle and all 
nodes dependent on it become extinct. Thus, the new cycle becomes the new core and the old core 
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Figure 7.7: Takeover by a core-transforming innovation. In the run displayed in Figure at 
n = 6061 the core is a 2-cycle comprising nodes 36 and 74. The graph update creates a core- 
transforming innovation - a new 5-cycle downstream from the 2-cycle 36-74. As a result the 2-cycle 
and all nodes dependent on it become extinct at n = 6062 and si drops from 100 to 32. Because 
an ACS survives this event, the system is in the growth phase at n = 6062. 




Figure 7.8: Takeover by a dormant innovation. In the run displayed in Figure l5?3b at n = 5041 
the core has two overlapping cycles. Downstream from the core is a 2-cycle comprising nodes 36 
and 74 which formed earlier at n = 4696. Node 85 happens to be one of the nodes with the least 
Xi and is removed from the graph. Thus, at n = 5042, the graph consists of one 2-cycle, 36-74, 
downstream from another 2-cycle, 26-90. As a result, the upstream cycle, 26-90, and all nodes 
dependent on it become extinct and si drops from 100 to 3. Because an ACS survives this event, 
the system is in the growth phase at n = 5042. 
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becomes extinct resulting in a core-shift. This is an exannple of condition (b) for a core-transfornning 
innovation. For all such events in Figure FTsl Ai(Q^) = Ai(C„_i) because k happened not to be a 
core node of C„_i. Thus, these core-shifts satisfy Ai(C„) = \i{Nn) > Xi{Q'^) = Ai(C„_i) > 1 
in Figure FTsl 

7.4.3 Takeovers by dormant innovations 

Dormant innovations, which create an irreducible structure in the periphery of the dominant ACS 
that does not affect its core at that time, were discussed in section r7.3.2l An example is the 2-cycle 
comprising nodes 36 and 74 formed at n = 4696. At a later time such a dormant innovation can 
result in a core-shift if the old core gets sufficiently weakened. 

In this case the core has become weakened by n = 5041, when it has Ai = 1.24 (Figure rTS]) . 
The structure of the graph at this time is very similar to the graph in Figure l4~5b . Just as node 3 in 
Figure l4~5b was a keystone node, here nodes 44, 85, and 98 are keystone nodes because removing 
any of them results in a graph like Figure l4~5b . consisting of two 2-cycles, one downstream from 
the other. 

Indeed, at n = 5041, node 85 is replaced and the resulting graph at n = 5042 has a 2-cycle (36 
and 74) downstream from another cycle (26 and 90). Thus, at n = 5042, nodes 36 and 74 form 
the new core with only one other downstream node, 11, being non-zero. All other nodes become 
extinct resulting in a drop in si by 97. A dormant innovation can takeover as the new core only 
following the removal of a keystone node that weakens the old core. In such an event the new core 
necessarily has a lower (but non-zero) Ai than the old core, i.e., Ai(Cn-i) > Ai(Cn) > 1 (see 
Figure rrs]) . 

Note that 85 is a keystone node and the graph is susceptible to a core-shift because of the 
innovation that created the cycle 36-74 earlier. If the cycle between 36 and 74 were absent 85 
would not be a keystone node because its removal would still leave part of the core intact (nodes 
26 and 90). 

7.5 Timescale of crashes 

I will denote by Hs the time elapsed between successive crashes, not counting the time spent in 
the recovery from the previous crash (i.e., the time from the previous crash to the first subsequent 
spanning of the graph by the dominant ACS). Figure U~9k shows the distribution of values 
for the 701 crashes, discussed above, observed in runs with s = 100, p = 0.0025. The average 
Ts = ^ 1088.3 and the standard deviation of Ug values is ^ 1581.0 for these 701 crashes. 
Figure FTOb shows the distribution of logi^Us values. The distribution and the value must be 
taken with caution because the distribution appears to be a very broad one (the standard deviation 
of ns values is larger than r^) and 701 instances may not be enough to specify it accurately. 
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Figure 7.9: a. Distribution of times elapsed between crashes (not counting the recovery time 
from the previous crash or beginning of the run), n^, for the 701 crashes observed in runs with 
s = 100, p = 0.0025. minjns} = l,max{ns} = 16625, Ts = ^ 1088.3, and the standard 
deviation of the 701 values is ^ 1581.0. The bin size is 400. b. Distribution of logj^QUs values 
for the same 701 crashes. The bin size is 0.1. 




Figure 7.10: a. Comparison of the in-degree distribution of nodes of the 701 graphs just before 
each crash (bars) with the in-degree distribution of all fully autocatalytic graphs in the same runs 
(line with circles), b. Comparison of the out-degree distributions for the same graphs as in (a). 
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7.6 Recoveries 

After a complete crash the system is back in the random phase. In 0(s) graph updates each node 
is removed and replaced by a randomly connected node, resulting in a graph as random as the 
initial graph. Then the process starts again, with a new ACS being formed after an average of 
time steps and then growing to span the entire graph after, on average, (1/p) ln(s/so) time 
steps, where sq is the size of the initial ACS that forms in this round (typically sq = 2 for small p). 

After other catastrophes, an ACS always survives. In that case the system is in the growth 
phase and immediately begins to recover, with si growing exponentially on a timescale 1/p. Note 
that these recoveries happen because of innovations (mainly of type 2 and 4, and some of type 3). 



7.7 Structure of ttie graphi just before a crash) 

The in-degree and out-degree distributions of the nodes of the 701 graphs just before each crash are 
shown in Figure mol The distributions are compared with the distributions for all fully autocatalytic 
graphs produced in the same runs. Both the distributions fall off faster than the distributions for 
all fully autocatalytic graphs. 
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Figure 7.12: Comparison of the distribution of Perron-Frobenius eigenvalues of the 701 graphs just 
before each crash (bars) with the distribution for all fully autocatalytic graphs in the same runs 
(circles). 



Another interesting difference between the graphs just before a crash and all fully autocatalytic 
graphs lies in their dependency distributions: The dependency distribution of the graphs before 
crashes (Figure 17. lip is bimodal, in contrast to the distribution for all fully autocatalytic graphs 
which is unimodal and has a larger mean. 

Figu re compares the distribution of Perron-Frobenius eigenvalues for the 701 graphs with 
the distribution for all the fully autocatalytic graphs. For the graphs just before each of the 701 
crashes the mean Perron-Frobenius eigenvalue is 1.07, with standard deviation 0.10 - clearly lower 
than the mean for the fully autocatalytic graphs, which is 1.43, with standard deviation 0.20. The 
three mechanisms that cause core-shifts, discussed previously, are all more likely to happen with 
graphs whose cores are cycles. Therefore, it is not surprising that the graph just before a crash 
tends to have a Perron-Frobenius value close to unity. 



Chapter 8 

Robustness of the ACS Growtti 
Mechanism 

8. 1 Variants of the model 

In this chapter, I examine the effect of various modifications to the model rules on the formation 
and growth of ACSs. These modifications relax several of the simplifications that have been made, 
which depart from realism but make the model simpler to analyze: 

1. Cij take values and 1 only; all catalysts have the same catalytic efficiency. 

2. Cij take positive values only; negative links - inhibitors - are not present in the model. 

3. Diagonal elements ca are zero; no self-replicators are allowed. 

4. The node with the least Xi is removed at each graph update; extremal selection. 

5. If several nodes have the same least only one is removed. 

6. The total number of nodes is fixed; the rate of node removals and node additions is exactly 
the same. 

7. The population dynamics is linear in the yi variables. 

I consider the following variants of the model, that relax these simplifications: 

1. Variable link strengths. Cjj, when non-zero, are allowed to take any value in the interval 
[0,1]. This relaxes simplification (1). 

2. Negative links. Qj, when non-zero, are allowed to take negative values also. This relaxes 
simplifications (1) and (2). 
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3. Self-replicators. The diagonal elements, ca, are allowed to take non-zero values. This 
relaxes simplification (3). 

4. Non-extremal selection. Simplification (4) is relaxed in three variants, in which the dy- 
namics is made non-extremal in different ways: 

(a) A fixed fraction / of the nodes with the least Xi are replaced at each graph update. 

(b) A single node is picked for removal, with a probability proportional to 1/Xj, at each 
graph update. When some nodes have Xi = then one of them is picked randomly. 

(c) The selection depends on a parameter, denoted q. At each graph update, the node 
to be removed is selected from the set of nodes with least Xj with a probability q, or 
randomly from all nodes with a probability I — q. 

5. Variable number of nodes. A fixed threshold xt is chosen and at each graph update all 
nodes with Xi < xt are removed and one new node is brought in. This relaxes simplifications 
(5) and (6). 

6. Different population dynamics. An equation for Xi in which the first term is quadratic in 
the relative population variables is used instead of equation l|4.1j) . This relaxes simplification 
(7). 

8.2 Variable link strengths 

The first modification I examine is allowing Cij to take values in the interval [0, 1], instead of just 
the values or 1. The initial graph is constructed as follows: for each ordered pair with 
^ 7^ j' is non-zero with probability p and Cij = with probability 1— p. If non-zero, the value 
of Cij is randomly picked from the interval [0,1] with uniform probability. As before ca is set to 
zero for all i to disallow self-replicators. Step 3 of the graph dynamics - addition of a new node - 
is similarly modified. Apart from these alterations the model remains unchanged. 

Figure IHTTl shows a run with these modifications in the rules. The same behaviour is seen. 
There is a 'random phase' where there is no ACS and the graph remains random. The chance 
formation of an ACS on an average timescale Tq = 1/p'^s triggers the 'growth phase' in which the 
number of non-zero nodes grows exponentially with a characteristic timescale Tg = 1/p, until the 
ACS spans the entire graph. 

This is expected because most of the analytical results hold, with minor modifications, for 
this variant too. The Perron-Frobenius theorem holds for the adjacency matrices of the graphs 
produced because they are still non-negative. The only significant modification in the analytical 
results is that now Ai can take values between and 1 also. For example, consider the 2-cycle 
shown in Figure [8?2l The strengths of the two links are a and b, any real numbers in the interval 
[0, 1]. For such a graph Ai = \/a6. 
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Figure 8.1: A run with s = 100, p = 
take any value in the interval [0,1]. 
number of links versus n. 



0.0025 with variable strength links: Cij when non-zero can 
a. number of nodes with Xj > 0, si, versus time, n. b. 
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Figure 8.2: A two cycle with link strengths a,b £ [0, 1]. The Perron-Frobenius eigenvalue of this 
graph is ^/ab. 



Thus, the condition Ai > 1 is changed to Ai > in all propositions that contain it. Proposition 
2.1(ii) is modified to read: If a graph C has a closed walk then Ai > 0. Proposition 3.2(iii) is 
similarly modified: If a graph C has an ACS then Ai > 0. 

The only difference this makes to the dynamics of the system is that in this variant it is much 
less likely for two disjoint ACSs with the same Ai value to exist in some graph. In the original 
model two or more ACSs, typically cycles, often coexisted in runs with small p (see Figures IS^ .u). 
With variable link strengths even if two disjoint cycles form the chance of their having exactly the 
same Ai value is negligible. 



8.3 Negative links: emergence of cooperation 

Now I generalize the model to include inhibitory reactions (where a species can decrease the rate 
of production of another species) along with variable link strengths. I implement this by modifying 
the way links are assigned in the initial random graph and to the new node at each graph update: 
if Cij is chosen to be non-zero then it is assigned randomly a value in the interval [—1, 1] if i / j 
or [—1,0] \i i = j (self-replicators continue to be disallowed but self-inhibitors are possible). With 
negative links the populations can become negative. To prevent this a cutoff must be put on iji 
when yi becomes zero: 

{Tj if % > or Tj > , , 

if = and < 

s 

where = ^ djVj - (j)yi. 
i=i 

For simplicity I retain the form of l|4.2j) although a more realistic chemical interpretation would 
require iji to be proportional to yi for inhibitory reactions. 

The relative population dynamics that follows from l|8.ip using Xi = Vi/ Yl'^j=iV3 'S- 

. / /, if Xi > or /i > 

Xi = { (8.2) 

10 if Xj = and /« < 



s s 

where fi = '^CijXj - Xi CkjXj. 

j=l k,j=l 
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Figure 8.3: A graph that has a limit cycle attractor. The negative links, each of strength -1, form a 
cycle. The positive links, each of strength 1, form another cycle running in the opposite direction. 
In the attractor each Xj oscillates around a mean value 1/3 with a frequency 27r\/3. The amplitude 
of oscillation depends on the initial condition but is the same for each Xi. 



The cutoff in the population dynamics makes it a nonlinear equation. As a result now the 
attractor need not be only a fixed point. For most graphs I have encountered in simulations the 
attractor of ()8.2jl has been a fixed point. In less than 0.2% cases the attractor has been a limit 
cycle where the relative populations of the nodes oscillate. I have not observed any other types 
of attractors. The simplest type of graph having a limit cycle attractor consists of 3 nodes and 
is displayed in Figure lOl For this graph each Xi oscillates around a mean value of 1/3 with a 
frequency 2tt\/^. The amplitude of oscillation depends on the initial condition but is always equal 
for the 3 nodes. If the amplitudes are smaller than 1/3 then none of the Xi hit zero and the cutoff 
in ()8.2jl plays no role in the oscillations. However if arbitrarily random small changes are made to 
the interaction strengths of the graph the oscillations of the Xj start hitting zero and the cutoff 
comes into play. In general the cutoff seems to play an important role in limit cycle attractors. It 
is only in some special cases and for particular initial conditions that the relative populations of 
nodes oscillate without hitting zero. Numerical observations indicate that the property of having 
a limit cycle attractor is structurally stable, i.e., a limit cycle attractor remains if sufficiently small 
changes are made to the interaction strengths of a given graph having a limit cycle attractor. 

The theorems used to analyze the attractors of the original model depended on the matrix C 
being non-negative and hence no longer hold for the system with negative links. However numerica l 



simu 



ations of the system show behaviour very similar to the original system ()Jain and Krishna 



20011 ) despite the slight bias toward negative links (because self-inhibitors are allowed but not 
self-replicators). Figure lOl shows the results of a run with s = 100 and p = 0.005. The number 
of nodes with Xj 7^ 0, si, after the n*'' addition of a new node, the number of positive [cij > 0), 
l+, and negative links {cij < 0), are displayed. Similar to the original model, the curves have 
three distinct regions. Initially si is small; most of the nodes have zero X^. Z+ and /_ also do not 
vary much from their initial value = 25) and remain approximately equal. In the second 

region si and Z+ show a sharp increase while L decreases. In the third region the growth of si and 
1+ stops and almost all nodes have non-zero Xi in contrast to the initial period. The number of 
positive links have increased to several times their initial value while the number of negative links 
has declined to about 10. I have observed the same qualitative behaviour in several runs with p 
values ranging from 0.00002 to 0.01 and s = 100,150,200. 




Figure 8.4: A run with s = 100, p = 0.005 where negative links are allowed, a. the number of 
nodes with Xi > 0, si, versus time, n. b. the number of positive, Z+, and negative links, Z_, 
versus n. 
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Figure 8.5: Example of a destructive parasite. A new node, 3, added to an existing 2-cycle, destroys 
the 2-cycle because of the negative link to node 1. The attractor is X = (0, 0, 1)^. 



Again the explanation for the above behaviour lies in the formation and growth of autocatalytic 
sets. The definition of an ACS has to be slightly modified: an ACS is a subgraph, each of 
whose nodes has at least one incoming positive link from a node belonging to the same subgraph. 
Proposition 3(i) also has to be similarly modified: An ACS must contain a cycle of positive links. 

Upto n = Ua = 1904, there is no ACS; the graph has no cycles of positive links. For such 
graphs, I find that Xi = for most nodes except a few at which the longest paths of positive 
links terminate, with intermediate nodes not having incoming negative links. This is a numerical 
observation that is similar to proposition 4.2. Thus most nodes have Xi = 0. Nodes with Xj / 
will also join the set C of nodes with least Xi if one of their supporting nodes is picked for removal 
from the system. Hence, all nodes are susceptible to be removed sooner or later; the graph remains 
as random as the initial graph and so si, Z+ and Z_ hover around their initial values. The dramatic 
increases in si and Z+ are triggered by the chance formation of an ACS at n = n^. In this system 
too an ACS will inevitably form by pure chance sooner or later. The 'time of appearance' of an 
ACS can again be approximated by the time of formation of a 2-cycle of positive links. As one 
wants the 2-cycle to contain positive links only, p in the previous formula for Ta has to be replaced 
by p/2. Thus « 4/p^s{= 1600 for p = 0.005 and s = 100) and P^) = 2^(1 - 

From n = 1904 to n = 3643 (when si first reaches s), I again find that the set of nodes 
with / is an ACS; the 'dominant ACS'. This is also a numerical observation. The proof 
of proposition 4.3 no longer holds but the system nevertheless shows identical behaviour in this 
respect to the system with only positive links. Hence, as long as the dominant ACS does not 
span the entire graph, i.e. si < s, there are nodes outside it with Xi = which form the set 
Therefore, none of the nodes in the dominant ACS can be removed in the next graph update. 

All the processes that can happen in the original model when a new node is brought in (num- 
bered 1, 2 and 3 in section 16. 2p happen in this system with the same effects. The only new 
possibility to which the presence of negative links leads is when the new node joins the dominant 
ACS and some of the other nodes in the dominant ACS receive negative links from it. Then those 
nodes may become extinct causing si to decrease. An example of this is shown in Figure ISTsl A 
new node, 3, is added to a previously existing 2-cycle. It gets a positive incoming link from node 
2 and a negative outgoing link to node 1. This negative link destroys the ACS. In the attractor 
only node 3 is non-zero while node 1 and 2 have X^ = 0. Node 3 is like a destructive parasite that 
feeds on the the host ACS (nodes 1 and 2) to increase its own relative population, in the process 



112 



Chapter 8. Robustness of the ACS Growth Mechanism 




-12 -11 -10 -9 -8 -7 -6 -5 

ln(p/2| 



Figure 8.6: Dependence of Tg on p for the model with negative links. Each data point shows the 
average of Inr^ over 10 different runs with s = 100, xq = 10~^ and the given p value. The error 
bars correspond to one standard deviation of Inr^ values for each p. The best fit line has slope 
—1.02 ±0.03 and intercept —0.08 ±0.27, consistent with the expected slope —1 and intercept 0. 



destroying the host. The negative link is essential for the destructive property of such parasites; 
in the system with only positive links only benign parasites are possible that do not harm the ACS 
upon which they are feeding. This example also shows that propositions 4.3 and 6.1 are no longer 
true; the set of nodes with Xj > need not be an ACS even if an ACS exists in the graph and Ai 
can decrease while si < s. However, destructive parasites occur rarely and do not reverse the trend 
of increasing si - in the displayed run they formed 6 times at n = 3388,3478,3576,3579,3592 
and 3613, and in each case only resulted in si decreasing by 1. Because the dominant ACS grows 
by adding positive links to the existing dominant ACS, the number of positive links increases as the 
ACS grows. On the other hand nodes receiving negative links often have Xi = 0, hence negative 
links get eliminated when these nodes are removed. 

One can calculate the growth rate of the dominant ACS in exactly the same way as for the 
system with positive links because destructive parasites appear rarely. The only difference is that 
p must be replaced by p/2 because that is the probability for the added node to get a positive link 
to/from any other node. Thus once an ACS forms, si locally averaged in time grows exponentially 
with a characteristic timescale Tg = 2/p. This is confirmed by numerical simulations (Figure lS^ . 

This variant of the model, apart from being an important test of the robustness of the ACS 
mechanism, also allows one to address questions concerning the emergence of 'cooperation' because 
negative links can arise in the network. The negative links in the network can be considered 
'destructive' links as they cause one species to decrease the population growth rate of another. In 
contrast, the positive links in the network can be considered 'cooperative' links. The increase of 
the ratio of positive to negative links, from its initial value of unity to value more than an order of 
magnitude larger in Figure lOI is evidence of the emergence of cooperation during the run. 
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Figure 8.7: A run where self-replicators are allowed. Diagonal elements cu, when non-zero, are 
chosen randomly from the interval [—1,1]. The run has s = 100, p = 0.005. 



8.4 Self-replicators 

In the original model, and all the variants discussed so far, self-replicators, i.e., self-links from a 
node to itself, are disallowed. As discussed in section l5'.2.3l this was done to see whether structured 
chemical networks can arise even in the (forced) absence of individual self-replicators. As shown 
in chapter El structured networks do indeed form. The question that now arises is: Will allowing 
self-links interfere with the formation and growth of other ACSs? In fact, it does not, as evidenced 
by Figure [8?7l which shows a run with negative links in which self-replicators were allowed, i.e., for 
the initial graph and for the new node cu (when non-zero) was chosen randomly from the interval 
[—1,1]. The probability for a self-replicator to form in a graph update event is p/2. The probability 
for a 2-cycle to form is p^s/A. These are comparable when p ^ 2/s. Thus for small enough p, 
self-replicators do form, and often trigger the growth phase, but their role in the network dynamics 
is similar to that of any other ACS. Self-replicators have a Perron-Frobenius eigenvalue Ai = 1 
and are, therefore, no more robust than cycles. ACSs with larger Perron-Frobenius eigenvalues 
out-compete self-replicators. 
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8.5 Non-extremal selection 

The rule for removing a node can be nnade non-extremal in various ways. Instead of removing a 
single node with the least Xi one can remove more than one node or remove nodes with higher 
Xi also. However, even when removing nodes which do not have the least Xj it is reasonable to 
allow some bias in favour of nodes with a higher X^. I consider the following possibilities: 

1. A fixed fraction / of the nodes with the least X-i are replaced at each graph update. 

2. A single node is picked for removal, with a probability proportional to l/Xi, at each graph 
update. When some nodes have = then one of them is picked randomly. 

3. The selection depends on a parameter, denoted q. At each graph update, the node to be 
removed is selected from the set of nodes with least Xj with a probability q, or randomly 
from all nodes with a probability 1 — q. 

Figure lOb shows a run with s = 100, p = 0.005 with 2 nodes removed at each graph update; case 
1 with / = 0.02. Figure lOb shows a similar run with 10 nodes removed at each graph update. 
Figure lOl shows a run with s = 100, p = 0.005 for case 2. In these three runs the behaviour 
is similar to the earlier models. An ACS forms by chance and grows until most of the graph is 
autocatalytic. In these cases the nodes comprising the dominant ACS are protected from removal 
until the size of the ACS becomes close to s. Only after the ACS has become more than s — 2 
for Figure lOb . more than s — 10 for Figure lOb . or has reached s for Figure lOl do nodes of the 
dominant ACS start getting removed in the graph updates. 

Case 3 is designed to interpolate between the extremal selection of the original model [q = 1) 
and the completely random run where there is no selection {q = 0, see black curve in Figure l5^ . 
Figure l^TTni displavs what happens for different values of q. The basic mechanism of an ACS forming 
and growing remains but as q is decreased the ACS becomes increasingly short lived and susceptible 
to destruction. Suppose the ACS has z indispensable nodes, any of whose removal would destroy 
the ACS. Then the probability that an indispensable node will be removed is (1 — q)z/s. Hence, 
there exists a timescale T„e = s/{z{l — q)} over which an ACS will get destroyed due to non- 
extremality. On the other hand for small p ACSs get destroyed even with extremal dynamics over 
a timescale r^. z does change with time but by choosing q sufficiently close to 1, one can make 
Tne ^ Ts. In that case the behaviour would be the same as in the case of extremal dynamics. This 
is confirmed by the run in Figure ISTOfc . where an ACS forms and grows until it spans the entire 
graph. After spanning it does not get destroyed for a substantial period of time. 




Figure 8.8: Non-extremal runs of type 1 with s = 100, p = 0.005. a. at each time step 2 nodes 
are replaced, b. at each time step 10 nodes are replaced. 
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Figure 8.9: Non-extremal run of type 2 with s = 100, p = 0.005. At each time step a single node is 
replaced. The node is chosen with a probability proportional to 1/Xi. If some nodes have Xi = 
then one of them is randomly chosen. 



These non-extremal runs seem to indicate that the ACS mechanism is stable to the introduction 
of non-extremality. In an ACS, organizationally important nodes, such as the core nodes, tend to 
have high relative populations and, therefore, these nodes are protected if there is a selective bias 
that favours removing nodes with less relative population. When the mechanism for choosing a 
node for removal is not based on the relative population of nodes, as (partially) in case 3, this 
protection is lost and the ACS is not as robust a structure. 
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Figure 8.10: Non-extremal run of type 3 with s = 100,p = 0.0025. a. q = 0.95, h. q = 0.99, c. 
q = 0.99999. 
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8.6 Variable number of nodes 

Another idealization in the original model is that at each time step exactly one node is removed 
and one node is introduced leaving the total number of nodes constant. A variant in which the 
number of nodes is not constant is the following: at each time step, all nodes with Xi below a 
fixed threshold, xt, are removed (more than one node can get deleted and the number of nodes is 
not constant). The rest of the procedure remains the same; one new node is added, the population 
dynamics is allowed to run its course with the new graph, and this is iterated many times. Notice 
that it is also possible for none of the nodes to be removed in a particular time step if they all have 
relative populations above xt- This is not possible in the original model. 

This variant too shows the same qualitative behaviour (see Figure [HTT|l . When there is no ACS 
in the graph, none of the nodes remain above the threshold for long. The graph remains small, 
with few nodes and links. The chance formation of an ACS changes things. Again the number of 
links and the number of nodes are seen to rise sharply as more and more nodes get attached to the 
dominant ACS. A third region analogous to the organized phase can also be seen, once the ACS 
grows large enough for some of its nodes to have Xi values below the threshold, but the transition 
is not so clear because there is no upper cutoff for the number of nodes in the graph. 

Because only positive links are allowed all the analytical results, including theorems 3.1 and 
4.1, hold for this variant too. In the initial graph, if there is no ACS, most of the nodes will have 
Xi = 0. All of these nodes will be removed from the graph leaving a graph with very few nodes 
and only those links assigned to the new node. Throughout the random phase the graph will 
consist of only very few nodes. As soon as the first ACS forms the graph will consist only of that 
ACS and the added node. The ACS grows whenever the new node joins it. Here there is a slight 
difference from the original model. The rare events where several nodes, including the new node, 
simultaneously join the dominant ACS are not possible in this variant. It is also not possible for an 
ACS that went extinct for some reason to be resurrected (as happened in Figures [5?6) .u) - in this 
variant the extinct nodes would all be removed from the graph immediately. 

Despite the absence of an explicit upper cutoff, the number of nodes cannot grow indefinitely. 
Because ^i=iXi = 1 always, the number of nodes cannot grow much larger than 1/xt. In 
the run shown above the threshold is xt = 0.005 and the number of nodes remains well below 
200 = 1/0.005. 
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Figure 8.11: A run with variable number of nodes. At each time step all nodes with Xi < 0.005 
are removed. A single new node is added and assigned links with existing nodes with a probability 
p = 0.005. a. the number of nodes in the graph, si, as a function of time, n. b. the total number 
of links in the graph as a function of n. 
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8.7 Different population dynamics 



Replicator equation. 

A variant where the qualitative be haviour does change drasticall y is one where equation l|4.1 



is replaced by the replicator equation ijlHofbauer and Sigmund 



Xi 



Xi 



k,j=l 



(8.3) 



The nanne 'replicator' com es from the fact that the equation models self-replicating s pecies, for 



example in model ecosystems (jHofba uer and Sigmundl.ll988l: 



Drossel and McKane 



20031 ). Because 



Xi is proportional to Xi, the relative population of species i can only grow if it is non-zero in the 
first place; once Xj = it will always stay zero, as expected for self-replicators. The replicator 
equation can have a variety of attractors depending on the coupling constants Cij and the initial 
conditions - fixed points limit cycles, heteroclinic cycles and chaotic attractors have been observed 
( Hofbauer and Sigmundl. 1988). The rep l icator equation has also been used in the hypercycle model 
of the origin of life l|Eigen and Schuster!. Il979l ). 

The other rules of the model are unchanged, making it an evolving version of the hypercycle 
model. Figu re l8?T2l shows a typical run. When there is no ACS the number of nodes with non-zero 
Xi fluctuates wildly, even reaching as high as 40 or more for short periods of time. However, 
no graph structure is stable for very long and the graph remains random. The times when there 
is an ACS are the periods in Figure \8A2l when si remains at some small constant value for a 
large number of graph updates. Yet, nothing resembling a growth phase is seen and the system 
eventually relapses into the random phase. What is happening is that small (hyper)cycles do form 
but quickly get destroyed when a graph update creates a parasite. This observe d behaviour is 
consi s tent with existing work o n well-stirred hypercyc l es an d replicator networks l|Niesert et al 



19811: iHannel and Stad 



iNiesert et al 



eri. Il998l: iTokita and Yasutomil. 12003^ 



( 19811 ) have identified three instabilities to which hypercycles are susceptible. 
Two are of relevance here: Figure [8?T3b illustrates the parasite instability mentioned above. For 
the graph shown here, if a > 1, the attractor of equation ()8.3|l is X = (0,0,1)^ for generic 
initial conditions. Thus, the nodes of the core cycle have the least Xi. Figure l8?T3b illustrates the 
short-circuit instability. For generic initial conditions the system always reaches an attractor where 
node 4, part of the longer cycle, is zero - in general, when long hypercycles are short-circuited by a 
new link, only the smaller hypercycle survives. These two instabilities are responsible for preventing 
a large, structured network from forming when the replicator equation l|8.3jl is used in place of 
equation ()4.ip for the population dynamics. 
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Figure 8.12: Number of nodes with Xi > 0, si, versus time, n, for a run with the replicator 
equation used for the population dynamics. For this run, s = 100, p = 0.0025. 



a) b) 



Figure 8.13: a. Example of the parasite instability of hypercycles. The links between nodes 1 
and 2 are of strength unity. When a > 1 the attractor of the replicator equation for this graph 
is X = (0,0,1)^ for generic initial conditions, b. Example of the short-circuit instability of 
hypercycles. All link strengths are unity. The attractor of the replicator equation for this graph is 
X = (1, 1, 1, 0)-^/3 for generic initial conditions. 
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The replicator equation is a special case of a 
quadratic in the relative population variables: 
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more general equation in which the first term is 



s s 

Xi ^ ^ Aiji^XjX}^ Xi ^ ^ Aiji^XjX}^. (^■^) 

j,k=l l,j,k=l 

This equation preserves the normalization of Xi because Yli=i — 0- The equation models a 
chemistry where two catalysts are (simultaneously) required for the production of each molecular 
species. It reduces to equation ()8.3|) in the special case where one of the catalysts is the product 
itself: Aijk = SijCjk, where 6ij = 1 if i = j and zero if i / j. 

In this case, the simplest ACS would consist of three species: species 1 and 2 catalyze species 3, 
species 2 and 3 catalyze species 1, and species 3 and 1 catalyze species 2 . Consider the case where 
one periphery node is added to this ACS - let species 1 and 2 also catalyze the production of species 
4, and let species 4 not be a catalyst for any of the other species. Numerical simulations show 
that, unlike the replicator case, here both the core and the periphery nodes have non-zero relative 
populations in the attractor for generic initial conditions. The ratio of the relative populations of 
the periphery to the core nodes depends on the link strengths but the core nodes 1, 2 and 3 remain 
with non-zero relative populations even if the link to the periphery node is stronger than the links 
between core nodes. Therefore, unlike the special case of the replicator equation, when equation 
()8.4jl is used for the population dynamics it is possible for small ACSs to grow by adding periphery 
nodes. 



Chapter 9 

Concluding Remarks 



In this thesis I have studied a model of an evolving catalytic network. I have discussed some of 
the issues and questions of interest related to the evolution of networks in section [L6l Here, I 
summarize the interesting phenomena observed in the model, their implications for some of those 
questions, as well as the limitations and possible extensions of the model. 

9. 1 Interesting features of thie model 

1. Rich dynamical behaviour: three phases and multiple timescales. For small p the 
system always appears to be moving from one unstable, or quasi-stable, state to another. It 
never appears to reach a stable state. Three 'phases' of behaviour are observed (see section 
I5.4jl . A 'random phase' in which the graph remains sparse and consists only of trees, chains 
and isolated nodes. The 'growth phase' is triggered by the chance formation of a small ACS 
which then grows by accreting nodes to itself. Finally, the 'organized phase' begins when the 
ACS spans the entire graph. There are transitions from each phase to other phases (except 
that there is never a transition from the growth phase to the random phase). The appearance 
of different structures dynamically generates multiple timescales: in the random phase, Tg 
in the growth phase and in the organized phase (see sections lOl and rTs]! . 

2. Growth of structure, interdependence and cooperation. Starting from an initial random 
graph, the system spontaneously self-organizes into a highly structured graph. This process is 
triggered by the chance, but inevitable, formation of an ACS, which then grows by accreting 
other nodes to it, until eventually the entire graph becomes an ACS (section I6.2jl . The 
resultant fully autocatalytic graph is highly non-random, occupying an exponentially small 
(as a function of s) volume of graph space (section I6.3.1j) . Nevertheless starting from a 
sparse graph the dynamics inevitably leads the system into this exponentially small volume in 
a time that grows only as Ins. The growth of exponentially unlikely chemical organizations 
on the prebiotic Earth in a short time is a puzzle that has been discussed in section fLOl 
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The growth of autocatalytic sets in this model might shed some light on this puzzle - the 
implications for the origin of life are discussed further in section [9?2l Along with the growth of 
non-random structure, the interdependence of the species also grows with the ACS. The fully 
autocatalytic sets that form have an interdependency almost two orders of magnitude larger 
than that of random phase graphs (sections 15.41 and l6.3.3jl . A variant of the model, where 
negative links are also allowed, exhibits the emergence of cooperation. The formation and 
growth of ACSs in this variant is accompanied by an increase in the number of cooperative, 
positive links between species and a decrease in the number of destructive, negative links. 
The ratio of cooperative to destructive links rises from unity, in the random phase, to an order 
of magnitude larger in the organized phase (section [0|) . The formation and growth of ACS 
might be one of the mechanisms responsible for the spontaneous emergence of cooperation 
in a variety of contexts. 

3. Structure of fully autocatalytic graphs. The graphs produced at the end of the growth 
phase have several interesting structural features. Firstly, they are fully autocatalytic. How- 
ever, they are not random fully autocatalytic graphs, as evidenced by their broad, though 
not scale-free, out-degree distribution which is displayed in Figures and I^Tl (however, 
whether runs from other parameter ranges produce scale-free graphs or not remains an open 
question). An interesting feature is that the in-degree distribution spans a much smaller 
range than the out-degree distribution. While the fully autocatalytic graphs have a small 
clustering coefficient (section [6. 3. 2p . they do consist of an irreducible core that is more clus- 
tered, from which grow the chains and trees that form the periphery. These observations 
suggest that, in addition to the commonly studied properties of small-worldness and scale- 
freeness, it might be interesting to examine real networks for properties such as autocatalysis, 
a core-and-periphery structure, a difference between the in- and out-degree distributions, etc. 

4. Large extinctions: destruction of ACSs. ACSs not only form and grow, but over long 
times also get destroyed. The destruction of an ACS is usually sudden and accompanied 
by the extinction of a large number of species, a 'crash'. The asymmetry between rises 
and drops of si as well as fat tails in the distr ibution of crash sizes fFig ure 17.111 are seen 



in the fossil record I^Newm ^and PalmeJ. Il999l and in financial markets ( Bouchaud . 2QQ1 



Johansen and Sornette. .20011 ). An interesting feature of the distribution of drops in si is it's 



bimodal structure for low p - the probability of large drops in si is an order of magnitude 
greater than intermediate size drops. A similar bim odality is observed i n distr ibution of 



earthquake sizes in an earthquake model described bv lCarlson and Lang erl (|l989i ). but the 
mechanism causing it is different. 

5. Causes of large extinctions: core-shifts, innovations and keystone species. A variety 
of mechanisms have been suggested as causes of large extin ction events in macroevolution 
^Mavnard-Smithl . .1994) and finance jBouchaudl . boOll : Ijohansen and Sornettel . 
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20011 ). The largest of the extinction events in this nnodel are usually core-shifts - a discon- 
tinuous change of the core of the dominant ACS in a single graph update. It is interesting 
that such core-shifts can be caused both by the removal of an important 'keystone' species 
as well as by graph structures created by the new species, 'innovations' (section [Oil- One 
can distinguish two processes involving innovations, both having analogues in the real world. 
One is exemplified by the appearance of the automobile that made the horse drawn car- 
riage and its ancillary industries obsolete. This is like the example of the core-transforming 
innovation shown in Figure U~7\ where the graph update produced an irreducible structure 
that was 'stronger' than the existing core. This structure became the new core, driving the 
old core and nodes dependent on it to extinction. The subsequent development of other 
industries dependent on the automobile mirrors the growth of the ACS around the new core. 
The second process is exemplified by the emergence of the body plans of several phyla that 
are dominant today. It is believed t hat while these body p lans originated in the Cambrian 



nat wnile tnese body p i 
jvalentine et al.l.llQQQ|l. 



era more than 520 million years ago ([Valentine et al.l. Il999l ). the organisms with these body 



plans played no major role till about 250 million years ago. They started flourishing only 
when the Per mian extinction depleted the other species that were dominant till that time 



19961 ). This is similar to the events shown in Figure FTSl where an earlier innovation 
had lain dormant for a while without disturbing the existing core, but when the latter became 
sufficiently weak, took over as the new core. 

Predicting crashes. Certain graph theoretic markers could possibly be used to predict 
whether a crash is imminent (section rTT]! . The Perron-Frobenius eigenvalue, Ai, is one such 
marker. Most of the large extinctions are core-shifts, caused by the removal of keystone 
species, takeover by core-transforming innovations, or takeover by dormant innovations. The 
graph is most susceptible to these core-shifts when its Ai = 1 or close to it. The dependency 
distribution is another marker: it is bimodal for graphs preceding a crash. The in and out- 
degree distributions also appear to fall off faster, as a function of degree, for graphs just before 
a crash, than for the fully autocatalytic graphs produced in the same runs. In combination, 
these different markers might indicate whether a given graph has an enhanced probability of 
suffering a crash. 

Analytical tractability of the model. The analytical tractability of the model is a result of 
the linearity of the yi dynamics. Equation ()4.1jl is nonlinear but, as it originates via a nonlinear 
change of variables from a linear equation, equation l|4.2jl . its attractors can be easily analyzed 
in terms of the underlying linear system. The attractors are always fixed points and are just 
the Perron-Frobenius eigenvectors of the adjacency matrix of the graph. Theorems 3.1 (the 
eigenvector profile theorem) and 4.1 (the attractor profile theorem) form the backbone of 
the analytic results about the population dynamics. Note that while the yi dynamics in the 
present model is essentially linear as long as the graph is fixed, it is highly nonlinear over 
long timescales. Because of the coupling of the population and graph dynamics, over long 
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timescales the 'coupling constants' Cij in equation l|4.2|l are not constant but implicitly depend 
upon the yi, thus making the evolution highly nonlinear. The simplifying device of widely 
separated timescales for the graph dynamics and the population dynamics (the population 
variables reach their attractor before the graph is modified) results in a piecewise linear yj 
dynamics that is easier to analyze. The analytic results about the population dynamics have 
been used in the analysis of the graph dynamics: results such as proposition 6.1, as well as 
average timescales of appearance and growth of an ACS use insight about the attractors of 
the population dynamics to predict aspects of the graph dynamics. 

8. Predictability and historicity in the network evolution. There is an interesting mixture of 
predictability and historicity in the model dynamics. The detailed structure of the dominant 
ACS is dependent on the chance events that happen throughout the run. Many complicated 
processes are possible, such as formation of new disconnected ACSs or the reinforcement of an 
old overshadowed ACS before it was completely broken up (Figures Is^ .u). Thus, the actual 
growth of the ACS is usually very complicated with the dominant ACS often undergoing 
drastic changes in structure caused purely by chance events. However, despite the historical 
particularities of a given run several aspects are predictable. The non-decreasing nature of 
Ai in the growth phase is a rule that holds for every run with any values of the parameters s 
and p (see proposition 6.1, section l6^ . Other results, such as the timescales of appearance 
and growth of the ACS, are predictable for ensembles of runs with the same value of s and 
p (section l6^ . Also, as discussed above, the transitions from the organized to the growth 
or random phases that involve the sudden extinction of a large number of species, might be 
predictable. 

9. Correlation between graph structure of perturbations and their impact on the evo- 
lution of the network. The network evolves by periodic perturbations, that involve the 
removal and addition of nodes, and there is a correlation between the graph theoretic nature 
of the perturbation and its short and long term impact. The perturbations can be broadly 
placed in two classes based on their effect on si: 

(i) 'Constructive perturbations': these include the birth of a new organization (an innovation 
of type 6), the attachment of a new node to the core (an innovation of type 4) and an 
attachment of a new node to the periphery of the dominant ACS (an innovation of type 2). 

(ii) 'Destructive perturbations': these include complete crashes and takeovers by dormant 
innovations (both caused by the loss of a keystone node), and takeovers by core-transforming 
innovations (innovations of type 5). The word 'destructive' is used only in the sense that 
several species become extinct on a short timescale (a single graph update) after such a 
perturbation. In fact, over a longer timescale (ranging from a few to several hundred graph 
updates), the 'destructive' takeovers by innovations usually trigger a new round of 'construc- 
tive' events like incremental innovations (type 2) and core enhancing innovations (type 4). 



9. 1 . Interesting features of the model 
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The maximum upheaval is caused by those perturbations that introduce new irreducible 
structures in the graph (innovations of type 4, 5 and 6) or those that destroy the existing 
irreducible structure. For example, the creation of the first ACS at n = 2854 triggered the 
growth phase in the run of Figure l53b . Other examples of large upheavals are core-shifts 
due to takeover by a core-transforming innovation at n = 6061, takeover by a dormant 
innovation at n = 5041, and a complete crash at n = 8233. 

10. Context dependence of the effective dynamics of the network evolution. It is char- 
acteristic of evolutionary systems, such as this model, that as different (graph) structures 
spontaneously appear, the nature of selective pressure on existing structures and hence the 
effective dynamics changes. In the present model the effective dynamics appears to be differ- 
ent in each of the phases even though the microscopic rules remain the same. Thus, in the 
random phase, because there is no ACS, the graph updates are almost random. In the growth 
phase there is competition between ACS and non-ACS structures. The dynamics essentially 
involves the growth of the dominant ACS by the accretion of nodes outside it. Once the 
entire graph becomes an ACS and the organized phase begins, the effective dynamics once 
again changes. Now there are no nodes outside the dominant ACS, yet some node has to 
be removed at each graph update. The competition now shifts to within the ACS - between 
core and periphery nodes. Typically the core nodes are protected by a higher relative popula- 
tion, therefore most of the graph updates involve periphery nodes. As the periphery realigns 
itself around stronger core nodes it can happen that a weak core node may be removed at 
some graph update. Over a long period of time such events may cause the core to become 
fragile and susceptible to core-shifts. Thus ACSs, which were robust in one phase, become 
fragile in another. Robus t and fragile structures are also found in highly designed systems 
jcarlson and Doyle . 1999 ). A different system, in which the selective pressures and ef fective 
dynamics change as different structures arise, is discussed by Cohen. M.. et al. ( 200ll ). 

11. Robustness of the ACS growth mechanism to changes in model rules. The basic 
model has a number of simplifying features that depart from realism but enhance analytical 
tractability. These are listed in chapter [HI along with variants of the model that relax those 
simplifications. As shown in that chapter, in most cases relaxing the various idealizations 
does not change the qualitative behaviour. A variant where the qualitative behaviour does 
change drastically is one where equation l|4.ip is replaced by the replicato r equation, making 
it an evolving version of the hypercycle model ([Eigen and Schuster!. Il979l ). In this case small 
(hyper)cycles do form but quickly get destroyed when a graph update creates a parasite. 
Consequently, large complex networks cannot form. However, the replicator equation is a 
special case of a more general equation. In the more general case, unlike the hypercycle 
model, the ACSs that arise do not get destroyed by parasites (nodes belonging to the pe- 
riphery). This robustness, one expects, would allow them to evolve into complex networks, 
as in the original model. 
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9.2 Implications for the origin of life 



As discussed in chapter [l] one of the possible scenarios for which the model might be relevant is 
for the evolution of a chemical network in a pool on the prebiotic Earth. From the point of view 
of the origin of life problem the main conclusions are: 

1. The model shows the emergence of a chemical organization where none exists. A small 
ACS emerges spontaneously by random processes and then grows until the entire system is 
autocatalytic. 

2. The graph structures that emerge (ACSs) are collective self-replicators even though none of 
their components (individual nodes) are self-replicators. 

As mentioned in section iLOl one of the puzzles of the origin of life is the emergence of structured 
non-random chemical organizations in a relatively short time. A highly structured organization, 
whose timescale of forming by pure chance is exponentially large (as a function of the size of the 
system), forms in this model on a very short timescale that grows only logarithmically with the size 
of the system. Under the assumption that the present model captures what happens in a prebiotic 
pool, the timescale for a fully autocatalytic chemical organization to grow in the pool is Tg = 
in units of the graph update time step. This time unit corresponds to the periodicity of the influx 
of new molecular species, hence it ranges from a day (for tides) to a year (for floods). Further, 
in the present model the 'catalytic probability' p is the probability that a random small peptide or 
RNA molecule will catalyze the production of another, and this has been estimated by iKaufFman 
( 1993 ) as being in the range 10~^ to 10"^'^. With this range of values for p, the timescale, Tg, for 



a fully autocatalytic chemical organization to grow in the pool lies in the range 10'' to 10^° years. 
The time t aken for the first cel l s to a rise on Earth, a few hundred million years after the oceans 
condensed () Joyce . llQRjJschonl l I1993I ). lies within this range. However, 10^ to lO^'' years is a very 
wide interval. A tighter prediction, and therefore a better test of the model, would require a more 
accurate value for the catalytic probability p for peptides and catalytic RNA, and a chemically more 
realistic model. 

These speculations might be complementary to some other approaches to the origin of life. 
Autocatalytic organization s of polypeptide s could ente r into symbiosis v yith t he autocatalytic cit- 



ric acid cycle proposed bv IWachterhauserl (|l99Ql ) and iMorowitz et al.l ()200d ). The latter would 
help produce, among other things, amino acid monomers needed by the former; the former would 
provide cata ysts for the latter. It is conceivable that lipid membranes (that have been argued by 
Isegre et a 1. 1, bonol. bonil. to have their own catalytic dynamics) could form and surround autocat- 
alytic sets, of the kind discussed here, in an enclosure. These 'cells' may contain different ACSs, 
or different parts of the same ACS, thereby endowing them with different fitness levels; such a 
population could evolve by natural selection. It is also conceivable that such autocatalytic sets of 
polypeptides formed an enabling environment for the formation and maintenance of self-replicating 
molecules such as those needed for an RNA world. 



9.3. Limitations of the model as a description of a chemical network 
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9.3 Limitations of ttie model as a description of a chiemical 
networl< 

I have interpreted the model as a chemical system, with nodes representing molecular species 
and the links representing catalytic interactions. This interpretation raises some questions. For 
example, what does it mean to remove a node from the graph? This effectively means that that 
molecular species cannot be produced again for a finite time until, at some later graph update, a 
new node comes in with exactly the same links as the one removed. If the species to be removed 
has a catalyst, then this is inconsistent with the assumption that there is a constant supply of 
all required reactants. In the random and growth phases this does not happen - in all cases the 
removed node either has no incoming links, or the Xi values of all its catalysts are zero. In the 
organized phase too, for runs with small p {ps < 1), a large fraction of graphs have si < s in 
which case the node that is removed (one of the nodes outside the dominant ACS) either has no 
incoming links or, if it does, all its catalysts have Xi = (otherwise the node would be part of the 
dominant ACS). However, when si = s, i.e., all nodes have non-zero Xi values, then the removed 
node always has a catalyst with a non-zero Xi value. Most of the time such removals do not 
significantly disturb the graph structure. However, in some cases they have drastic effects - for 
instance, complete crashes (section [7.4. Ijl . 

Another question concerns the structure of the molecules in the system. In this model, the 
identity of the molecular species is defined solely by their interactions and their structure is not 
specified at all. In reality, it is the structure of molecules that determines their chemical properties. 
The absence of structure in the model precludes its use in addressing a different problem of the 
origin of life: How did long proteins with strong and highly specific catalytic properties form from 
the short, weakly catalytic polypeptides existing initially? 

Other weaknesses of the interpretation of the model as a chemical system lie in the assumptions 
behind the functional form of fi, in particular, the dependence of Xi only on the relative populations 
of catalysts of i and not on the reactants. Such an assumption implies that the reactants are 
all present and have unchanging concentrations, i.e., they are buffered and there is effectively a 
constant supply of reactants. This can lead to the uncontrolled growth in the population of some, 
or all, molecular species. In reality, the reactants will eventually get consumed thereby truncating 
the growth of population. Further, the dependence of the growth rate on reactants would make 
the dynamics nonlinear and hence oscillations and chaotic dynamics might occur. The form of fi 
precludes the possibility of this kind of dynamics. The choice of fi also does not allow for the 
possibility of other types of chemical reactions, such as conversion of one substrate to another, or 
cleavage and ligation of molecules, which might be important for the prebiotic pool scenario. 
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Another limitation is the absence of spatial degrees of freedonn. Thus, the nnodel applies only 
to a well-stirred chennical reactor and cannot address the emergence of spatial structure and its 
effect on the dynamics of the system. As prebiotic pools need not be well- stirred i t woul d be more 



These are the most serious limitations. Apart from them, the model has several simplifications 
that aid in the analysis of its dynamical behaviour. These are listed below: 

1. All catalysts have the same catalytic efficiency. 

2. Inhibitors - negative links - are not present in the model. 

3. Self-replicators are not allowed. 

4. The selection is extremal; the node with the least Xi is removed at each graph update. 

5. The total number of nodes is fixed; the rate of node removals and node additions is exactly 



Variants of the model, in which the rules are modified to relax these simplifications, were described 
in chapter Island show the same qualitative behaviour as the original model. Therefore, these are 
not serious limitations. 



The above limitations of the model suggest possible directions in which it would be worthwhile 
to extend the model. Firstly, the structure of the molecular species could be explicitly introduced 
into the model. This would require specifying some rules to determine the catalytic properties of a 
molecule from its structure. The structure of the molecular species should determine the reactions 
in which the molecule will participate. Then there would be no need to remove a node from the 
graph. In such a model one could explicitly study the evolution of certain structural features of 
molecules in the system - for instance their length. 

Another direction in which to extend the model as a description of a chemical system would 
be to introduce the reactants explicitly, and also include other kinds of reactions, e.g., reactions 
in which one molecular species is converted to another type of molecular species. In the reductive 
citric acid cycle discussed above, a crucial reaction involves the splitting of citrate into two smaller 
molecules, which then undergo various reactions until each is re-converted into citrate. Therefore, 
including these kind of reactions is necessary to study the emergence of this kind of autocatalytic 
set in a chemical network. These changes fit within the general framework described in section 



appropriate to use reaction-diffusion equations to model the xi dynamics 




the same. 



9.4 Possible extensions of tlie model 



9.5. Modeling other systems within the same frameworl< 
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Going beyond that framework, spatial degrees of freedom could be added to the model by using 
reaction-diffusion equations for the Xj dyna mics. However, it is computationally difficult to solve 



such equations for complicated geometries ijPress et a 1. 1. Il992l ). A compromise might be achieved 
by simulating a finite number of compartments, within each of which the systern is wel l mixed, 
and allowing some exchange or diffusion across compartment boundaries jBhallal. boosi ). Then 
different ACSs could exist in different compartments and interact with each other. This is likely to 
be computationally easier to simulate and at the same time allows some spatial differentiation. 

9.5 Modeling other systems withiin thie same framework 

The general framework described in section [TtI can be used to model systems other than chemical 
networks. For example, an ecosystem could be modeled. In that case the nodes would represent 
biological species while the links would represent interactions such as prey-predator, competitive 
or symbiotic interactions between species. Here removing a node that has become extinct is not a 
problem because the species are self-replicators (once a species becomes extinct its chance of being 
recreated can be neglected). However, the self-replicating character of the species in this interpre- 
tation must also be reflected in the population dynamics. The Lotka -Volterra equation, replicato r 



equation or other more complicated dynamical equations may be tried ([Drossel and McKanel.l2QQ3l ). 
In modeling an ecosystem one might also wish to modify the way the new node is assigned links. 
While a random assignment might be interpreted as the invasion of the ecosystem by an entirely 
alien species, one might also want to include the possibility of an existing species mutating. 

Another system that might be modeled in this manner is a genetic regulatory network. The 
nodes would represent the genes. The Xi would represent the level of expression of a gene, and the 
links would represent regulatory interactions between the protein product of one gene and another 
gene. Here again one might wish to allow a new node (a new gene) to be a mutated copy of an 
existing gene. One might also wish to include other processes that change the graph such as gene 
duplications. 

A third way to modify the graph dynamics would be to allow the rewiring of existing links. This 
might be most relevant when using the model to describe an economic system, with the nodes 
representing economic 'agents' such as buyers and sellers, or companies, or nations. 
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Appendix A 

Proofs of Propositions 



Proposition 2.1: If a graph, C, 

(i) has no closed walk then Ai(C) = 0, 

(ii) has a closed walk then Ai(C) > 1. 

(i) If a graph has no closed walk then all walks are of finite length. Let the length of the longest 
walk of the graph be denoted r. If C is the adjacency matrix of a graph then {C^)ij equals the 
number of distinct walks of length k from node j to node i. Clearly C™ = for m > r. Therefore 
all eigenvalues of C™ are zero. If Aj are the eigenvalues of C then A,f are the eigenvalues of . 
Hence, all eigenvalues of C are zero, which implies Ai = 0. This proof was supplied by V. S. 
Borkar. 

(ii) If a graph has a closed walk then there is some node i that has at least one closed walk to 
itself, i.e. {C^)ii > 1, for infinitely many values of k. Because the trace of a matrix equals the 
sum of the eigenvalues of the matrix, X^*=i(C''')ii = '^'l^i A^, where Aj are the eigenvalues of C. 
Thus, Ei=i\*' > 1. for infinitely many values of k. This is only possible if one of the eigenvalues 
Ai has a modulus > 1. By the Perron-Frobenius theorem, Ai is the eigenvalue with the largest 
modulus, hence Ai > 1. This proof was supplied by R. Hariharan. □ 

Proposition 2.2: If all the basic subgraphs of a graph C are cycles then Ai(C) = 1, and vice 
versa. 

For any cycle of k nodes whose adjacency matrix is A, it follows that is the identity matrix 
{A^)ij = Sij (for each i there is a closed walk of length k from node i to itself, and there are no 
other walks of length k). Therefore, each eigenvalue. A, of A satisfies A'^' = 1, i.e., |A| = 1. The 
Perron-Frobenius eigenvalue of A must be real, therefore, Ai(^) = 1. Thus, if all basic subgraphs 
of a graph C are cycles then Ai(C) = 1. 

If Ai(C) = 1 then the basic subgraphs must be irreducible, they cannot be single nodes. As- 
sume that one of the basic subgraphs is not a cycle. Let its adjacency matrix be A. Because A is 
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a) b) 




Figure A.l: a. A graph that is an ACS, but not irreducible (e.g. there is no path from node 3 to 
node 1). b. A graph that is irreducible but is not a cycle. Removing the link from node 3 to node 
1 would convert it into a cycle. 



irreducible, every row has at least one non-zero entry. Construct A' by removing, from each row of 
A, all non-zero entries except one that can be chosen arbitrarily. Thus A' has exactly one non-zero 
entry in each row. Clearly the column vector x = (1,1,..., 1)^ is a right eigenvector of A' with 
eigenvalue 1 and hence Ai(^') > 1. Also A > A', therefore, by the Perron-Frobenius theorem for 
irreducible matrices, Xi{A) > Xi{A') Xi{A) > 1. But this contradicts the earlier assumption 
that Ai(A) = 1. Hence, if Ai(C) = 1, all the basic subgraphs must be cycles. □ 

Proposition 3.1: (i) All cycles are irreducible graphs and all irreducible graphs are ACSs. 
(ii) Not all ACSs are irreducible graphs and not all irreducible graphs are cycles. 

(i) Clearly in a cycle, there is a path from every node i to any other node j hence all cycles are 
irreducible. In every irreducible graph each node must have an incoming link from one of the other 
nodes. Hence all irreducible graphs are ACSs. 

(ii) This is proved by providing two counterexamples. Figure lA^lfc shows an ACS that is not an 
irreducible graph, and Figure IaH j shows an irreducible graph that is not a cycle. □ 

Proposition 3.2: (i) An ACS must contain a closed walk. Consequently, 

(ii) If a graph C has no ACS then Ai(C) = 0. 

(iii) If a graph C has an ACS then Ai(C) > 1. 

(i) Let A be the adjacency matrix of a graph that is an ACS. Then by definition, every row of 
A has at least one non-zero entry. Construct A' by removing, from each row of A, all non-zero 
entries except one that can be chosen arbitrarily. Thus A' has exactly one non-zero entry in each 
row. Clearly the column vector x = (1, 1, ... , 1)^ is an eigenvector of A' with eigenvalue 1 and 
hence Ai(A') > 1. Proposition 2.1 therefore implies that A' contains a closed walk. Because the 
construction of A' from A involved only removal of some links, it follows that A must also contain 
a closed walk, (ii) and (iii) follow from part (i) and propositions 2.1(i) and (ii), respectively. □ 
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Proposition 3.3: If Ai(C) > 1, then the subgraph of any PFE of C is an ACS. 

Let X be a PFE of a graph. Renumber the nodes of the graph so that > only for i = 1, . . . , A;. 
Let C be the adjacency matrix of this graph. As x is an eigenvector of the matrix C 

s 

k 

Because xi > Q for i = 1, . . . , k and Ai 7^ 0, it follows that Yl^=i ^ij^j > ^^^^ i G {1, . . . , A;}. 
Therefore, for each i E {1, . . . ,k} there exists a j such that > 0. Hence the k x k submatrix 
C' = (cij), i,j = 1, . . . ,k, has at least one non-zero entry in each row. Thus each node of the 
subgraph corresponding to this submatrix has an incoming link from one of the other nodes in the 
subgraph. Hence the subgraph is an ACS. □ 



Lemma A: Let ^ be a submatrix of a non-negative matrix B. Denote the Perron-Frobenius 
eigenvalue of A by \i{A) and that of B by \i{B). Then Xi{A) < \i{B). 



Renumber the nodes so that B can be written as: 

/ 

B = 



Consider the matrix 



B' 



A 


Ci 


C2 


C3 


A 












and denote its Perron-Frobenius eigenvalue by Xi{B'). 

Because B' < B we have Xi{B') < Xi{B) from the Perron-Frobenius theorem. 
Also Xi{B') = max{0,Ai(A)}. 
Hence, Xi{B') = Xi{A). 
Therefore Xi{A) < Xi{B). 



□ 
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Proposition 3.4: Let x be a PFE of a graph C , and let C denote the adjacency matrix of the 
subgraph of x. Let Ai(C") denote the Perron-Frobenius eigenvalue of C . Then Ai(C") = 
Ai(C) and C must contain at least one of the basic subgraphs of C . 

Let the nodes be numbered such that > only for z = 1, . . . , A;. Because x is an eigenvector of 
the matrix C we have X]^=i CijXj = XiXf, i = 1, . . . , s (Ai(C) has been abbreviated to Ai). 
=r- y^j— 1 CijXj = XlXi', i = 1, . . . , /c. 

Therefore {xi,X2, ■ ■ ■ iXkf'' is an eigenvector of the submatrix C = (cij), i,j = l,...,k, with 
eigenvalue Ai. Also by lemma A, Ai(C) < Ai. Therefore Ai(C") = Ai. It follows that the 
subgraph C must contain a strong component with Perron-Frobenius eigenvalue Ai. That strong 
component would be present in C also and would be one of it's basic subgraphs. Hence there 
exists at least one basic subgraph of C that is non-zero in any PFE of C. □ 

Lemina B: Let C be an s x s non-negative irreducible matrix with Perron-Frobenius eigenvalue 
Ai and let E = Adj{C - Xil). Then either E > or E < 0. 

As E is the adjoint of C - X J and \C - Xil\ = 0, it follows that E(C - Xil) = and 
{C — XiI)E = 0. Hence any column of E is either a PFE of C or is a column of zeroes. If it 
is a PFE then by the Perron-Frobenius theorem it is either strictly positive or strictly negative. A 
similar result holds for the rows of E. Hence either E = or E has all elements non-zero. If E 
has all elements non-zero then it follows that either > or < 0. 

I now show that one element of E is non-zero. The (s, s) element of E '\s \D — Xil\ where D 
is the matrix C with the last row and column deleted. Consider the matrix 





f 


\ 




D 





c' = 






\ 


0/ 



Clearly C < C . Because C is irreducible it cannot have a zero column or row; if it did there would 
be no path to or from that node to other nodes. Hence C / C , which implies, as a consequence 
of the Perron-Frobenius theorem for irreducible matrices, that Ai is not an eigenvalue of C and, 
therefore, is not an eigenvalue of D either. Hence Egs = \D — X\I\ ^ 0. 

Therefore, either E > Q or E □ 
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Proposition 3.5: If a node is non-zero in a PFE tlien all nodes it has access to are non-zero in 
that PFE. 

Let i he a node that is non-zero in the PFE x. Let j be any node downstream from it. Therefore 
there exists a positive integer k such that {C^)ji > 0. 
Assume that xj = 0. As J2i=i{C'^)jiXi = XiXj, 

=^ {C^)jlXi = 0, for each I = 1, . . . , s. This is a contradiction because > and {C^)ji > 0. 
Hence Xj > 0. □ 



Proposition 3.6: If a node is upstream from a basic subgraph then that node is zero in any PFE. 

Let the nodes be renumbered so that the last few - nodes r, r -|- 1, . . . , s - form a basic subgraph 
of the graph C . Let node p be some other node that is upstream from this basic subgraph, i.e., 
there exists a positive integer k such that {C^)rp > 0. Let x be a PFE of C. Then: 



— \\xi] i — 1, . . . ,s. 
The equations for z = r, r -|- 1, . . . , s can be written in matrix form as follows: 



/ X, \ 



A 



\ Xr-l J 



0, 



where Aij = {C'')i+r-i,j {i = 1, ■ ■ ■ ,m;j = 1, . . . ,r - l;m = s - r + 1), 

and Dij = Ci+r-i,j+r-i {hj = 1, ■ ■ ■ ,m) is the adjacency matrix of the basic subgraph. 



^ Xr ^ 
\Xs J 



( h\ 



(A.l) 



with b = (6i,62,...,6„0^< 0,7^0. 

Let us assume that node p is non-zero in the PFE, i.e. Xp > 0. Also {A)ip = {C^)rp > 0, 
therefore hi < 0. 

Let Ri denote the i*'' row of D'^ - Xp. 
Because - Xfl\ = 0, 

m 

^R, = Y^ajRj, (A.2) 

i=2 

with at least one aj being non-zero. 
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l)A.l|) gives 

m 

'^{Rj)iXi+r^i = bj; j = l,...,m. 

i=l 

Multiply the above equation by aj and sum from j = 2 to j = m. 



m 



From (I A. 211 we have 



j=l j=2 i=2 
m m 

m 

i=2 



^ (D^' - Af/)i, = ajiD'^ - (A.4) 
Let ^ = Adj^D'^ - By definition 

m 

E^J{D'' - = \D'' - A^/| =0; i = l,...,m, 

m p 



Comparing with ()A.4|1 we have 



a, =--^<0, 



because from lemma B either > or < (as D is irreducible it follows that is irreducible 
hence lemma B is applicable). 

i=2 



Therefore from ljA7 

^ 6i > 0. 

However, we are given that hi < 0. Hence the assumption that Xp > has led to a contradiction. 
The only other possibility is that Xp = in the PFE. □ 
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Proposition 3.7: If a node does not have access from a basic subgraph then it is zero in any PFE. 



Let the k x k submatrix A consist of all the strong components that do not have access from any 
basic subgraph of C. The matrix C can be written as: 



C 



A 





R 


B 



\ 



I 



where B consists of all the basic subgraphs of C and all the nodes having access from them. Let 
X be a PFE of C . The eigenvector equations give: 



{A - All) 



(A.5) 



A has no basic subgraphs of C, hence Ai is not an eigenvalue of A. Therefore ()A.5|1 has only the 
trivial solution Xi = 0; i = 1, . . . , k. □ 



Theorem 3.1: Eigenvector profile theorem 

For any graph C, determine all the basic subgraphs of C. Denote them by Di, . . . , Dk- 
Determine which of these does not have any other Di downstream from it. Denote these by 
El, . . . , En- Then: 

(i) For each i = I, . . . , N there exists a unique (upto constant multiples) PFE in which the 
nodes of Ei and all nodes having access from them are non-zero and all other nodes are 
zero. It is evident that these PFEs are simple. Moreover these are the only simple PFEs of 
the graph C. 

(ii) Any PFE is a linear combination of these N simple PFEs. 



The first statement follows from propositions 3.4-3.7. To prove the second statement, I first write 
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the matrix as follows: 



C 



\ 



A 










El 








P 






En 





Q 


R 


D 



J 



A contains all the nodes i = 1, . . . ,p — 1 that don't have access from any Ei. Thus all nodes in 
A will be zero in any PFE x. D contains all nodes i = p + m + l,...,s having access from some 
Ei, i.e. i? / 0. As xi, . . . ,Xp_i = the eigenvector equations for each Ei are: 



{Ei - Xil) 



( X, \ 



Because each Ei is irreducible the above equations have solutions that are unique upto constant 
multiples. Denote these solutions by x- \ j = . . . ,li + rm. The equations for D are: 



R 



+ {D- All) 



0, 



{P - All) 



where b < 0,/ 0. 



Consider the case where only one of the Ei is non-zero. Denote the b for that case by b(') and 
the corresponding (unique) solution of the above equations by x^*^; j = p + m + 1, . . . , s. For the 
general case where more than one Ei is non-zero, b will be a linear combination of the b^*^ because 
Xfc, for each node k in each Ei, is unique upto constant multiples. He nce the general so lution for 
Xp+m+1, ■ ■ ■ will also be a I inear combination of a^p_j_^_|_]^, . 
a different proof that uses the principle of mathematical induction. 



RothblumI (|l975l ) provides 

□ 
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Proposition 3.8: Every node in the core of a simple PFE has access to every other node of the 
PFE subgraph. No periphery node has access to any core node. 

Fronn theorem 3.1 it follows that the subgraph of any simple PFE is the subgraph induced by the 
nodes of one of the basic subgraphs Ei (defined in the statement of theorem 3.1) and all nodes 
having access from Ei. The core nodes are the nodes of E^. Because Ei is irreducible, it is evident 
that every core node has access to every other core node. Further, every periphery node has access 
from some, and therefore all, nodes of E^. Thus, each core node has access to every other node 
of the PFE subgraph. 

Now assume there is some periphery node p that has access to some core node r. Also r has 
access to p, therefore p is strongly connected to r. In that case p would also be part of the basic 
subgraph Ei, i.e., p would be a core node, not a periphery node. Thus, no periphery node can 
have access to any core node. □ 



Proposition 4.1: For any graph C, 

(i) Every eigenvector of C that belongs to the simplex J is a fixed point of equation ()4.1jl . 
and vice versa. 

(ii) Starting from any initial condition in the simplex J, the trajectory converges to some 
fixed point (generically denoted X) in J. 

(iii) For generic initial conditions in J, X is a Perron-Frobenius eigenvector (PFE) of C. 

(iv) If C has a unique (upto constant multiples) PFE, it is the unique stable attractor of 
equation l|4.1jl . 

(v) If C has more than one linearly independent PFE, then X can depend upon the initial 
conditions. The set of allowed X is a linear combination of a subset of the PFEs. The interior 
of this set in J may then be said to be the 'attractor' of 1)4. l|l . in the sense that for generic 
initial conditions all trajectories converge to a point in this set. 



The proof uses the Jordan canonical form of the matrix C ()Horn and Johnsonl. Il985l ) 



/ ^.i(Ai) 



J 



\ 



with si + S2 + . . . + Sk = s. 



142 



Appendix A. Proofs of Propositions 



Each Jsi(Aj) is called a Jordan block and is an Si x Si matrix of the fornn: 

\ 



/ Xi 1 

Xi 1 
A, 



JsAX^) 



\ 



1 

Xi ) 



The Xi are the eigenvalues of C . The Aj need not be distinct; there nnay be more than one Jor- 
dan block with the same diagonal. It is known that any complex matrix can be put in the above form 
by a similarity transformation, i.e. there exists a non-singular matrix P such tha t C = PJP~^, a nd 



more over, the form of J is unique barring rearrangements of the Jordan blocks ()Horn and Johnson 
19851 ). The columns Pj of P form k 'Jordan chains' {pi, . . . ,Pg^},{pgj^^i, . . . ,Pg^_^_g^}, . 
{Ps_sj.+i, • • • , Ps} that satisfy the following equations: 



Cpi = Aipi^ 

CPi = AiPi + Pi_i; i = 2,. 
CPsi+l = A2PS1+1, 

CPsi+i = X2Psi+i + Psi+i-l! 



,■51, 



-.52, 



CPs-Sk + 1 



AfcPs-s^+j + Ps-Sfc+j-li 



2, . . . 



Thus Pi, Ps^+i, • • • , Psk-i+i ^''^ eigenvectors of C with eigenvalues Ai, A2, • • • , A^ respectively. It 
can be shown that there is one and only one linearly indepen dent eigenvector of C associated with 
each Jordan block and vice versa ()Horn and Johnsonl . 119851 ). The other columns of P are called 
'generalized eigenvectors' of C. 

Now consider the underlying dynamics (|4.2)i from which (|4.1jl is derived: Because ()4.1jl is 
independent of (p, we can set <^ = in ()4.2p without any loss of generality. With (f> = the general 
solution of ()4.2|1 . which is a linear system, can be schematically written as: 



y(i) 



y(o), 



where y(0) and y(t) are column vectors. 
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Using the Jordan canonical form of C in the general solution of 1)4. 2|l we find: 

y(t) = Pe-^*p-V(0). 



Now 



/ eJsi(Ai)t 



e^2 



V 



where 



/I t ^ 
1 t 



\ 



1 / 



Let 



Y = P lim e 



Jtp-1 



y(o). 



Then the attractor of ()4.ip will be X = / ^t^iYi and only the fastest growing connponents in 
Y will contribute to X. Therefore, in calculating Y, I will keep only the fastest growing terms and 
set all others to zero for convenience. 

First consider the case where y(0) is unrestricted - any initial condition is allowed. Then some 
components of P~^y(0) could be zero. If all the components corresponding to a particular Jordan 
block are zero in P^^y(O) then that Jordan block will not contribute to e"'*P^^y(0). Further, 
among the Jordan blocks that do have some non-zero components in P~^y(0), only the ones with 
the largest Aj will survive in the t — > oo limit because of the e^'* term. Let the Jordan blocks 
that survive be the first K ones (the nodes can always be renumbered to achieve this). From the 
structure of e"^"'^^'^* it then follows that the only non-zero components of e'^*P^^y(0) will be the 
components 1, 1 + si, 1 + si + S2, • • • , 1 + Ylf=i^ ^j- Therefore Y = t^e^^ c^pj, with i being 
summed over the set {1, 1 + si, 1 + si + S2, • • • , 1 + J2f=i^ ^j}- Cj, r, A are constants and Pi is 
the ith column of P. Then, the attractor is X = co(cipi + ci+siPi+si + • • • + CkPk), where cq is 



K-l 



Sj. Notice that pi,pi+si, 



,Pk 



a constant chosen to make J2i=i -^i — ^"^^ ^ = 1 + X] 
are all eigenvectors of C with eigenvalue A. Therefore X is a fixed point, and is an eigenvector of 
C. This proves (i) and (ii). 



144 



Appendix A. Proofs of Propositions 



Now consider the case of generic initial conditions. In this case P~^y(0) > 0. The only terms 
to survive will be in the Jordan blocks corresponding to the Perron-Frobenius eigenvalue, Ai, of 
C. Moreover, of these blocks only the largest sized blocks will survive. Suppose the largest Jordan 
blocks corresponding to eigenvalue Ai have size r. Let the Jordan blocks be arranged such that 
these blocks are the first few. Then, in the limit t ^ ""^^ 

/O ... 1 



... 



e 



(r-l)! 



oo, e will have the following form: 

0\ 



... 1 



... 



... 



... 



Thus if there are I Jordan blocks of size r corresponding to eigenvalue Ai then, for the purpose 
of calculating X, all elements of c^* can be set to zero except the elements (1, r), (r + 1, 2r), . . . , 



{{I — l)r + l,lr) which will be equal to 



(r-l)! 



Therefore the only non-zero elements of 



e'^*p-iy(0), in the t ^ oo limit, will have values Cife^^/{r - 1)! for i = 1, r + 1, . . . , (I - l)r + 1, 
where Cj are real constants that depend on P~^y(0). Thus, 



Y = P lim e'^^p- V(0) ~ / ^ * . 
t^oo ' ir-l)\ 



(ciPi + Cr+lPr+l + . . . + C(i-i)r+i^{i-i)r+i). 



Therefore, 



X = Co(ciPi + Cr+lPr+l + • . . + C(;_i)r_,_iP(/_i)r_,_i), 



where cq is a constant chosen to make — 1- Thus X is a linear combination of those 

PFEs of C that correspond to the largest Jordan blocks with eigenvalue Ai. This proves (iii) and 
(iv). □ 
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Theorem 4.1: Attractor profile theorem 

1) Determine all the strong components of the given graph C. Denote these by Ci, . . . , Cm- 

2) Determine which of these are basic subgraphs. Denote them by Di, . . . , Dk- 

3) Construct a graph, denoted D* , with K nodes, representing the Di, and a link from node 
j to node i if there is a path from any node of Dj to any node of Di that does not contain 
a node of any other basic subgraph. 

4) Determine which of the Di are at the ends of the longest paths in the above graph D* . 
Denote these hy Fi,i = \, ... ,N . 

For each i = 1,...,A^ there exists a unique (upto constant multiples) PFE that has only 
nodes of Fj and all nodes having access from them non-zero, and all other nodes zero. The 
attractor set consists of all linear combinations of these PFEs that lie on the simplex J. 

It has already been shown (proposition 4.1) that the attractor X is a linear combination of those 
PFEs of C that correspond to the largest Jordan blocks with eigenvalue Ai. Therefore to prove 
this theorem it suffices to show that the PFEs corresponding to the largest Jordan blocks with 
eigenvalue Ai are precisely those PFEs corresponding to the subgraphs Fi that are at the ends of 
the longest paths in D* . 



It can be shown that the size of the large st Jordan b 



Rothblum 



ock with eigenvalue Ai is equal to the 



19751 . where it is called the index of C). 



length of the longest path in the graph D* (see 
It is also shown by Rothblum that for each basic subgraph Di there is a sequence of generalized 
eigenvectors y^, . . . ,y^, where h is the length of the longest path in D* starting at Di, such that 
{C - Ai/)yi = 0, (C - Ai/)y'= = y^-^ (where A; = 2, . . . , /i) and yj^ > if and only if the longest 
path from a node of Di to node j when projected onto the graph D* has a length k. 

Let the length of the longest path in D* be r and consider one such path starting at, say, Dj 
and ending at some Fi. Then, corresponding to this path, there is a sequence of r generalized 
eigenvectors that includes the PFE in which only Fi and nodes having access from it are non- 
zero. Thus there is a Jordan block of size r corresponding to the PFE in which only Fi and 
all nodes having access from it are non-zero. This is also the largest Jordan block. Therefore 
the PFEs corresponding to the largest Jordan blocks with eigenvalue Ai are precisely those PFEs 
corresponding to the subgraphs Fi that are at the ends of the longest paths in D* . The attractor 
set will consist of linear combinations of these PFEs. □ 
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Proposition 4.2: For any graph with Ai(C) = 0, in the attractor only the nodes at the ends of 
the longest paths are non-zero. All other nodes are zero. 

Consider a graph consisting only of a linear chain of r + 1 nodes, with r links, pointing from node 
1 to node 2, node 2 to 3, etc. The node 1 (to which there is no incoming link) has a constant 
population yi because the r.h.s of l|4.2jl vanishes for i = 1 (taking </> = 0). For node 2, we get 
y2 = yi, hence y2{t) = 2/2(0) + yit ~ t for large t. Similarly, it can be seen that yu grows as t^^"^ . 
In general, it is clear that for a graph with no cycles, yi ~ f for large t (when (/> = 0), where r is 
the length of the longest path terminating at node i. Thus, nodes with the largest r dominate for 
sufficiently large t. Because the dynamics l|4.ip does not depend upon the choice of Xj = for 
all i except the nodes at which the longest paths in the graph terminate. □ 

Proposition 4.3: For any graph C, 

(i) For every X belonging to the attractor set, the set of nodes i for which > is the 
same and is uniquely determined by C. The subgraph formed by this set of nodes will be 
called the 'subgraph of the attractor' of l|4.1jl for the graph C. 

(ii) If Ai(C) > 1, the subgraph of the attractor is an ACS. 

This is a corollary to theorem 4.1. The set of nodes j for which Xj > consists of nodes 
in all the subgraphs Fi described in the statement of theorem 4.1, and all nodes having access 
from them. This set evidently depends only on C. Further X is a PFE of C. Therefore, from 
proposition 3.3 the subgraph of the attractor X is an ACS if Ai(C) > 1. □ 

Proposition 6.1: Ai is a non-decreasing function of n as long as si < s. 

Let the adjacency matrix of the graph at time step n be denoted C„ and its Perron-Frobenius 
eigenvalue be denoted Ai(C„). Removing a node with the least Xi from this matrix gives an 
(s — 1) X (s — 1) matrix that I denote T. 

From lemma B, Ai(r) < Ai(Cn). 
Because si < s the removal of the node with the least Xi leaves the dominant ACS (denoted £>„) 
unaffected, i.e., Dn is a subgraph of T. 
Ai(L>„) < Ai(r) < Ai(C„). 

However Ai(C„) is an eigenvalue of Z)„. Hence, Ai(D„) = Ai(C„), 

Ai(r) = Xi{Cn). 

Now a new node is added to T to get C„+i. Because T is a subgraph of C„+i, it follows that 

Ai(C„+i) > Ai(T), 

Xl{Cn+l) > XliCn). □ 
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Proposition 7.1: Let Nn denote the maximal new irreducible subgraph that includes the new 
node at time step n. Nn will become the new core of the graph, replacing the old core 
Qn-i, whenever either of the following conditions are true: 

(a) Ai(iV„) > Ai(Q;,) or, 

(b) Xi{Nn) = Xi{Q'^) and is downstream of Q'^. 

After a node is removed from the graph C„_i, the resultant graph is C'„ which has the core Q'„. 
Now a new node is added to this graph to get C„. This graph contains the irreducible subgraph 
Nn- Clearly if Xi{Nn) > Ai(Q^) then Nn is the only basic subgraph of Cn and will therefore be 
its core. But if Xi{N„) = Ai((5^) then it is only one of the basic subgraphs of C„. By definition 
Q'^, because it is the core of C^_i, contains those basic subgraphs that are at the ends of the 
longest chains of basic subgraphs in Therefore, Nn will be a basic subgraph at the end of 

the longest chain of basic subgraphs in C„ only if it is downstream from some node of Q'n- If this 
is so, then it will be the single basic subgraph at the end of the longest chain, and therefore will 
be the core of Cn. □ 
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Appendix B 

Programs to simulate the model 



This appendix describes two programs, included in the attached CD, that can be used to simulate 
the model and its variants. The programs differ in the way the attractor is found at each time step. 
The first program uses theorem 4.1. The second numerically integrates equation l|4.1jl to find the 
attractor. These different methods will be described below, in sections rB.4l and (BTI but first I will 
describe what is common to the two programs. Both programs are written in C++, however they 
can be easily translated to C because I have used only certain small convenient features of C++ 
and no object oriented code. 



B. 1 Variables and data types 

The two parameters of the model are s, the total number of nodes of the graph, which is specified 
by the variable named n of type int (integer), and p, the catalytic probability, which is specified 
by the variable named Iprob of type double (double precision real number). 

The dynamical variables are the graph C, whose adjacency matrix is stored as a two dimensional 
s X s array of type double, and the relative populations Xi which are stored as a one dimensional 
array of type double. 



B.2 Creating thie initial random graph) 



The initial random graph is created in the function inakec4. For each s}, I generate a 

(pseudo)random number of type double uniformly distributed in the interval [0,1]. If the random 
number is less than p then Cij is assigned unity, otherwise it is assigned zero. For generating 
(pseudo)random numbers uniformly d i stribu ted in the interval [0,1] I have used the routine rani 

t al.l. I1992I ). However, as Press et al. do not allow Numerical 



from Numerical Recipes ()Pre 



Recipes routines to be distributed publicly, I have replaced calls to rani in the programs included 
in the attached CD by calls to the internal random number generator drand48. The function call 
is drand48() and returns a number of type double uniformly distributed in the interval [0,1]. 
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Appendix B. Programs to simulate the model 



The variable idum of type long (long integer) is used to set the initial seed of the random number 
generator using the function srand48. 

B.3 The graph update 

Once the attractor is known, the set L of nodes with the least Xi has to be determined. Because 
the Xi are stored as data type double it makes sense to choose a threshold such that two numbers 
differing by less than the threshold will be considered equal. In the programs, the threshold has been 
chosen to be 10~^. Note that in the case where theorem 4.1 is used to determine the attractor, 
the set £ is known exactly in most cases; there are few situations where the threshold is actually 
required. 

After the set £ is determined, one picks an arbitrary node out of this set for removing from 
the graph. This node is picked by choosing a random number from the set {0, 1, . . . , / — 1}, where 
I is the size of the set L (if r is a random number in the interval [0, 1] generated using rani or 
drand48, then int(l*r) is a random integer between and /, inclusive). 

Once the node to be removed is decided, say node k, the graph update is done by the function 
call update4(k). All the elements of the matrix in that row and column are reassigned to or 1 
by the same procedure used to create the initial matrix, i.e., a random number is generated and 
for each i ^ k, cik is assigned unity if the random number is less than p, and is assigned zero 
otherwise, and similarly for Cki- The function update4 also sets Xk to xq (the variable xin of type 
double) and rescales all other Xj so that Yl,i=i = 1- 

B.4 Program that uses the attractor profile theorem 

B.4. 1 Determining wliicli node to remove 

The crucial step involves the determination of the set, C, of nodes with the least Xi at each time 
step. The results from chapterlH in particular theorem 4.1, are used to do this. 

1. Determine all the strong components of the given graph. Denote these by Ci, C2, ■ ■ ■ , Cm- 

2. Determine which of these are basic subgraphs. Denote them by Di, . . . , Dk- 

3. Construct a graph, D*, of K nodes representing the Di with a link from node j to node i 
if there is a path from any node of Dj to any node of Di that does not contain a node of 
any other basic subgraph. 

4. Determine which of the A are at the ends of the longest paths in the above graph D*. 
Denote these by Fj. 



B.4. Program that uses the attractor profile theorem 



151 



5. For each Fi construct a vector with (arbitrary) non-zero values assigned to those components 
corresponding to nodes in and all nodes having access from them. Set all other components 
to zero. 

6. (a) If there are any nodes that are zero in all these vectors then they form the unique set C 
for generic initial conditions. 

(b) If no nodes are zero, then first the PFE corresponding to each Fi is determined using 
Matlab. This is done by constructing the adjacency matrix of the subgraph induced by all 
nodes in Fi and all nodes having access from them and using Matlab's eig function to 
determine the (unique) PFE of this matrix. Then, 

If there is only one Fi, its PFE is used to determine C Because its PFE is unique upto 
constant multiples, the set C is also unique and independent of the (generic) initial conditions. 
If there is more than one Fj, then C depends on the initial conditions. In this case is 
taken to be a linear combination of the PFEs corresponding to each Fi, with the coefficients 
of the linear combination being random numbers generated by rani or drand48. The set C 
is then determined from these values of Xj. 

This last case only occurs for graphs in which several ACSs coexist and the whole graph is spanned 
by these ACSs. Such a situation occurs very rarely. Thus when si < s the set C is unique and 
independent of initial conditions. When si = s most of the time there is only one Fi in which 
case again C is unique and independent of initial conditions but has to be determined numerically 
(using Matlab). Finally, very rarely, when si = s and there is more than one Fj in the graph then 
the set jC depends on the initial conditions and an arbitrary linear combination of PFEs is taken as 
the attractor. The function that implements this algorithm in the program is called thmeig. 

Note that the code was written to work with Matlab 5.2 and therefore may require some 
modifications to work with other versions of Matlab. 

B.4.2 Output files 

Having specified the values of the variables n (the size of the graph), Iprob (the catalytic probabil- 
ity), xin (the relative population assigned to each new node) and idum (the initial random seed), 
the program produces a run with the specified number of graph updates. Several output files are 
created, which consist of one or more columns of data. Each row corresponds to an iteration of 
the graph dynamics and different filenames are used for storing different quantities. The list below 
shows the naming scheme I have used for the output files: 

1. *.l : the total number of links. 

2. *.px : column 1 has the number of nodes with Xj > 10"^, column 2 has the number of 
nodes with Xj > calculated (using theorem 4.1) in thmeig if there is an ACS, or -1 if 
there is no ACS in the graph. 
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3. *.lm : the Perron-Frobenius eigenvalue of the graph. 

4. *.inin : information about each graph update, which node was replaced, which new links 
were assigned. This can be used to reconstruct all the graphs later using the program 
mtx.cpp included in the attached CD (see appendix C). 

5. * . clno : the number of irreducible subgraphs of the graph. 

6. *.clev : the Perron-Frobenius eigenvalue of each irreducible subgraph. 

7. * . cn : number of core nodes. 

8. *.cs : list of core nodes. 

9. *.Onuin : number of nodes with Xi = 0. 
10. *.Onodes : list of nodes with Xi = 0. 

Any filename can be substituted for *. I typically use a name that contains information about 
the values given to n, Iprob and idum, thereby allowing me to distinguish one type of run from 
another. 



B.5 Program that finds thie attractor numerically 

B.5. 1 Numerical method for determining tlie attractor 

As an alternative to the above program that uses theorem 4.1, I have included, in the attached 
CD, a program that numerically integrates equation ()4.1|l to find its attractor. The program uses 
a 5th order adapt ive Runge-Kutta scheme based on the algorithm given in Numerical Recipes 
( Press et al.[ Il992l ). The function that implements this in the program is called dyneig. A call to 
dyneig replaces the call to thmeig in the previous program. 

I have included this alternate program for two reasons. One, for several variants of the model, 
such as the variant with negative links (section theorems 3.1 and 4.1 do not hold. Then the 
only option is to use a numerical method to find the attractor - many of the variants discussed in 
chapter [HI have been simulated using this method. 

The second reason is historical. When I initially began working on this model, theorems 3.1 
and 4.1 did not exist (in my mind). All the initial runs were done using the numerical method. 
Some patterns were consistently observed in these runs. For example, whenever there was a 2-cycle 
downstream from another, and these were the only basic subgraphs, then the upstream 2-cycle was 
always zero. These observations were the stepping stones that led to the theorems. 



B.5. Program that finds the attractor numerically 
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B.5.2 Output files 

As in the previous program, the following variables must be set before running the program: n (the 
size of the graph), Iprob (the catalytic probability), xin (the relative population assigned to each 
new node) and idum (the initial random seed). For a given run, the program produces four output 
files, each of which consist of one or more columns of data, and are named as follows: 

1. *.l : the total number of links. 

2. *.px : column 1 has the number of nodes with Xi > 10~^, column 2 has a flag that is 1 if 
the set of nodes with > 10~^ is an ACS and otherwise. 

3. *.S1 : list of nodes having X^ > 10~^. 

4. *.niin : information about each graph update, which node was replaced, which new links 
were assigned. This can be used to reconstruct all the graphs later using the program 
mtx.cpp included in the attached CD (see appendix C). 
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Appendix C 

Contents of the attached CD 



The attached CD contains the following files: 

• contents: The contents of the CD; this list. 

• thesis.pdf: This thesis in PDF format. 

• thesis.ps: This thesis in Postscript fornnat. 

• Programs/thmruns . cpp: A program to simulate the model, that uses theorem 4.1 (see 
section EH). 

• Programs/rkSruns . cpp: A program to simulate the model, that uses a 5th order Runge- 
Kutta method to integrate equation ()4.1jl (see section fB^ . 

• Programs/mtx. cpp: A program to extract the adjacency matrix of the graph at any given 
iteration, from the *.inin file produced by thmruns.cpp or rkBruns.cpp. The program 
requires the seed used to initialize the random number generator for that run. 

• Prograins/mtx2gw. cpp: A program to convert the adjacency matrix files produced by 
mtx.cpp to the LEDA format. 

• Programs/calcdeg. cpp: A program to calculate the in, out and total degree distribu- 
tions for some or all graphs of a run from the *.min file produced by thmruns.cpp or 
rkSruns.cpp. The program requires the seed used to initialize the random number genera- 
tor for that run. 

• Programs/calcdep . cpp: A program to calculate the interdependency and dependency dis- 
tribution for some or all graphs of a run from the *.min file produced by thmruns.cpp or 
rkBruns . cpp. The program requires the seed used to initialize the random number generator 
for that run. 
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Programs/rndgraph . cpp: A program to produce a set of graphs drawn from the ensemble 
G^s, and to calculate the degree and dependency distribution. 

Data/tOlslnlOO.*: The data files produced by thmruns.cpp for a run with s = 100, p = 
0.001. This is the run displayed in chapter|ni(see Figures Isl and l53b ). 

Data/t025s*nl00 . *: The data files produced by thmruns.cpp for runs with s = 100, p = 
0.0025 and different seeds. These are the set of runs, totaling 1.55 million iterations, men- 
tioned in sections [6.3.2l I6.3.3l and 17.21 t025slnl00 . * is the run displayed in chapter|Hl(see 
Figures lOandlQj). 

Data/t05s3nl00 . *: The data files produced by thmruns.cpp for a run with s = 100, p = 
0.005. This is the run displayed in chapter |Hl(see Figures [5?2l and lOb ). Note that these 
data files, and the ones above, were produced by a version of thmruns.cpp that uses rani 
as the random number generator and, therefore, are likely to differ from runs with the same 
random seed that are produced by the program provided in the CD, which uses drand48. 

Data/crashes .txt: A list of iterations, for each of the runs t025s*nl00.* with s = 
100, p = 0.0025, at which the graph update resulted in a crash. The core overlap for each 
crash is also given. 

Graphs/n* .gw: The graphs of Figure [5?6l in LEDA format. 

Graphs/n* .mtx: The adjacency matrices of the graphs shown in Figure lS^ 
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esting — not least in illunninating the essentially feudal structure of lISc, designed to maintain, 
as the bottom rung, an inexpensive and grateful labour force. In addition, during my stint in 
the Scouncil, I benefited from four crash courses: Beautifying lISc in 30 Days, Eradicating Child 
Labour from lISc in Six Easy Steps, Practical Lessons for the Modern Witch-hunter, and How to 
Talk at Important Meetings for Dummies. 

If I survived coursework, it is both because of and despite my Integrated PhD classmates. Their 
names, in descending order of weirdness: 

Shubra > Prabudhha > Ayash > Dhruba > Pinaki > Nandan > Rangeet. 

This list misses out Sundar, who is no less weird, but his weirdness is orthogonal to the others. 
Thanks for (among many many other things that I can't remember right now) teaching me how 
to cycle. 

In these six years many friends have left for other places. I remember them with fondness 
and miss their company: Aditi, Niruj, Arti, Rjoy, Dandy, Sumana, Devyani, Sundar, Raju, Baliga, 
Poulose, Prithu, Raghavan, Subroto and Rsidd. Fortunately, new friendships have also formed: 
Osho, Ayesha, Moushumi and Natasha will be remembered for helping me and others during CTS 
parties, impromptu chocolate pastries, agreeing to be a murder victim and dropping a snake onto 
(well, close to) my foot, respectively. 

My memories of lISc will also include: heavy metal education at Styx, courtesy Niruj; Dandy 
taking us to the edge of urbanity; Vatsa's views on mixing quantum field theory with alchohol; 
Raju and Negi frightening me out of my wits by mistaking a cow for an elephant; and the pool 
incident, which deserved a quieter night. 

Much of my time recently has been spent in the welcome company of Bhavtosh, Moyna, Raghu, 
Joby, Toby and Vivek. Thanks to Vivek for catalyzing puns and other wordsmithings. Perhaps one 
day we'll finish the movie, the play and some junk we started. Toby and I share a common love 
for Calvin and Hobbes. From Chembra to World Cups to Rallyk and 'Walk in the Woods', he's 
always been great company and great competition. Joby has made a positive difference to the 
lives of many people and he can count me among them. The small amount of time I have spent 
at the tmsc classes and the construction colony with him have been more educative than all the 
coursework I did in IISc. I will miss Joby every time I see a bird that I can't identify, or hear a bird 
that I can't see. 

It's difficult to adequately describe the kind of crucial life support those closest to me have 
provided. I hope they will extrapolate from the few words I have managed to put down. My 
parents have, successfully, tread the narrow path between pressurizing me with their opinions and 
abandoning me to my own decisions. Bhavtosh, Moyna and Raghu: If at all we drift apart, I hope 
it will only be geographically. 



